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Quantum Monte Carlo (QMC) methods are among the most accurate for computing 
ground state properties of quantum systems. The two major types of QMC we use are Vari- 
ational Monte Carlo (VMC), which evaluates integrals arising from the variational princi- 
ple, and Diffusion Monte Carlo (DMC), which stochastically projects to the ground state 
from a trial wave function. These methods are applied to a system of boson hard spheres 
to get exact, infinite system size results for the ground state at several densities. 

The kinds of problems that can be simulated with Monte Carlo methods are expanded 
through the development of new algorithms for combining a QMC simulation with a classi- 
cal Monte Carlo simulation, which we call Coupled Electronic-Ionic Monte Carlo (CEIMC). 
The new CEIMC method is applied to a system of molecular hydrogen at temperatures 
ranging from 2800K to 4500K and densities from 0.25 to 0.46 g/cm^. 

VMC requires optimizing a parameterized wave function to find the minimum energy. 
We examine several techniques for optimizing VMC wave functions, focusing on the ability 
to optimize parameters appearing in the Slater determinant. 

Classical Monte Carlo simulations use an empirical interatomic potential to compute 
equilibrium properties of various states of matter. The CEIMC method replaces the empir- 
ical potential with a QMC calculation of the electronic energy. This is similar in spirit to 
the Car-Parrinello technique, which uses Density Functional Theory for the electrons and 
molecular dynamics for the nuclei. 

The challenges in constructing an efficient CEIMC simulation center mostly around the 
noisy results generated from the QMC computations of the electronic energy. We introduce 
two complementary techniques, one for tolerating the noise and the other for reducing 
it. The penalty method modifies the Metropolis acceptance ratio to tolerate noise without 
introducing a bias in the simulation of the nuclei. For reducing the noise, we introduce 
the two-sided energy difference method, which uses correlated sampling to compute the 
energy change associated with a trial move of the nuclear coordinates. Unlike the standard 
reweighting method, it remains stable as the energy difference increases. 
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Chapter 1 
Introduction 



The first computer simulations of a condensed matter system used the simplest potential, 
the hard sphere (Metropolis et ah, 1953). As computers and simulations progressed, more 
sophisticated and realistic potentials came into use. These potentials are parameterized 
and then fit to reproduce various experimental quantities. Both Molecular Dynamics (MD) 
and Monte Carlo (MC) methods are used to generate ensemble averages of many-particle 
systems. 

These potentials originate from the microscopic structure of matter, described in terms 
of electrons, nuclei, and the Schrodinger equation. But the many-body Schrodinger equa- 
tion is too complicated to solve directly, so some approximations are needed. The one 
electron approximation is a successful approach, where a single electron interacts with an 
external potential (ie, the nuclei) and with a mean field generated by all the other electrons. 
This is done by Hartree-Fock (HF) or with Density Functional Theory (DFT) (Parr and 
Yang, 1989). DFT is in principle exact, but contains an unknown exchange and correla- 
tion functional that must be approximated. The most common one is the Local Density 
Approximation (LDA). 

These first principles calculations are used in fitting the potentials, which are then used 
in an MC or MD computation. But the problem of transferability still remains. Empirical 
potentials are only valid in situations for which they have been designed and fitted. 

In 1985, Car and Parrinello introduced their method, which replaced the empirical po- 
tential with a DFT calculation done 'on the fly' (Car and Parrinello, 1985). They did a 
molecular dynamics simulation of the nuclei of liquid silicon and then computed the den- 
sity functional energy of the electrons at every MD step. To improve the efficiency of the 
computation of the DFT energy, they introduced a new iterative method for solving the 
DFT equations. It has been a very successful method, with the original paper being cited 
over 2300 times since its publication. 
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Previously, the DFT equations had been solved by eigenvalue methods. But eigenvalue 
problems can also be regarded as optimization problems, where an energy functional is 
minimized. Car and Parrinello used an idea similar to simulated annealing, but they used 
molecular dynamics to move through parameter space, rather than Monte Carlo. This had 
the effect of making the equations of motion similar between the electronic problem and the 
nuclear problem, with the only difference being in the relative masses. Since the electronic 
problem was not real electron dynamics, the electron mass does not correspond to any 
physical quantity, and is only a parameter controlling the convergence of the electronic part 
of the simulation. Since then, other iterative methods have been introduced, usually based 
on the Conjugate- Gradient method (Payne et al, 1992). 

A brief review of applications of the Car- Parrinello method to liquid problems is given 
by Sprik (2000). This review also mentions that LDA and some other functionals are not 
good enough to accurately simulate water (there are improved functionals that are accept- 
able). Another review of molecular dynamics by Tuckerman and Martyna (2000) includes 
material on treating the nuclei classically and also using path integrals to treat the nuclei 
quantum mechanically, done by Marx and Parrinello (1996). 

Quantum Monte Carlo (QMC) methods have developed as another means for accurately 
solving the many body Schrodinger equation (Hammond et ah, 1994; Anderson, 1995; 
Ceperley and Mitas, 1996). The success of QMC lies partly in the fact these methods 
explicitly include correlation among the electrons, which can not be done directly with the 
one electron methods. Particularly with the Local Density Approximation (LDA), DFT has 
known difficulties in handling electron correlation (Grossman et ah, 1995). 

In the spirit of the Car-Parrinello method, we integrate a Classical Monte Carlo sim- 
ulation of the nuclei with a QMC simulation for the electrons. This we call Coupled 
Electronic-Ionic Monte Carlo (CEIMC). There are some challenges in constructing an ef- 
ficient method. 

The first problem we encounter is that the results of a QMC simulation are noisy. The 
QMC energy has some uncertainty associated with it, and it could bias the classical part of 
the simulation. We could run the QMC simulation until the noise is negligible, but that is 
very time-consuming. A better way is use the penalty method, which modifies the usual 
MC formulas to be tolerant of noise. 

The electrons are assumed to be in their ground state, both in the Car-Parrinello method 
and in our CEIMC method. There are two internal effects that could excite the electrons 
- coupling to nuclear motion and thermal excitations. In the first case, we make the Born- 
Oppenheimer approximation, where the nuclei are so much more massive than the electrons 
that the electrons are assumed to respond to nuclear motion instantaneously, and so stay in 
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their ground state. We neglect any occupation of excited states of the electrons due to 
coupling to nuclear motion. 

In the case of thermal excitation, let us examine several relevant energy scales. If we 
consider a gas of degenerate electrons at a density ofn — 0.0298 electrons per cubic Bohr 

1 /3 

(i.e. rs = (4^;^) = 2.0), the Fermi temperature is about 140,000K. The gap between the 
ground state and the first excited state of a hydrogen molecule at equilibrium bond distance 
is about 124,000K. As long as our temperatures are well below this (and they are), and we 
are not at too high pressures (pressure decreases the gap), the thermal occupation of excited 
states can be neglected. 

Hydrogen is the most abundant element in the universe, making an understanding of its 
properties important, particularly for astrophysical applications. Models of the interiors of 
the gas giant planets depends on a knowledge of the equation of state of hydrogen (Hubbard 
and Stevenson, 1984; Stevenson, 1988). Hydrogen is also the simplest element, but it still 
displays remarkable variety in its properties and phase diagram. It has several solid phases 
at low temperature, and the crystal structure of one of them (phase III) is not fully known 
yet. At high temperature and pressure the fluid becomes metallic, but the exact nature of 
the transition is not known. 

Computer simulation can also be used to obtain results on model systems. We will 
examine the hard sphere Bose gas, a simple and important model. For this model, all 
the approximations we make are controllable, and we will look at how to deal with those 
approximations and obtain exact results for this model. 

1.1 Thesis Overview 

Chapter 2 is an introduction to the basic classical and quantum Monte Carlo techniques 
we will be using. Chapter 3 presents an improved QMC method for computing the energy 
difference between two systems. Chapter 4 is an examination of parameter optimization, 
which is essential in VMC. We present various methods for minimizing the energy, and 
give some comparisons between them. 

Successful CEIMC simulations are based on the penalty method for tolerating noise in 
the Metropolis method, which is detailed in Chapter 5. Some additional details are dis- 
cussed, and an example of CEIMC applied to a single H2 molecule is given. The results of 
computations of the ground state energy of the boson hard sphere model are presented in 
Chapter 6. In Chapter 7, the CEIMC simulation method is applied to fluid molecular hy- 
drogen. We present data for a few state points and perform some analysis of the simulation 
itself. 
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Chapter 2 

Monte Carlo Methods 



Monte Carlo integration methods are very useful for evaluating the basic integrals of sta- 
tistical and quantum physics. In a system with A^^ particles, these integrals have the form 

^ JdRnjRMR) 
^ ^ IdRn{R) ^'-'^ 

where is a 3Np dimensional vector, 'k{R) is a probability distribution, and 0{R) is the 
observable or quantity of interest. These integrals have two important characteristics: high 
dimensionality and the integrands are sharply peaked - only small parts of phase space 
contribute significantly to the integral. 

The high dimensionality makes a grid based scheme impractical in two ways. First, sup- 
pose we have a 300 dimensional integral (100 particle simulation), and want 10 grid points 
in each dimension. Even this crude integration requires function evaluations at IQr'^ grid 
points! Second, consider the trapezoidal rule (as a concrete example) in d dimensions. The 
error using samples will go as 0{N~^/'^). As we will show, the error in Monte Carlo 
mtegration goes as O {N'^l^). The Monte Carlo error is independent of the dimensional- 
ity whereas the grid based method depends on it strongly. For these high dimensionality 
problems, Monte Carlo is only practical choice. 



2.1 Basic Monte Carlo Integration 

Consider an integral of the form 



/= C f{x)dx. (12) 
Jo 



To evaluate by Monte Carlo, compute f{x) at points sampled uniformly from [0, 1]. An 
approximation to / is given by 

!=1 
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The estimate of the statistical error in / will be 



Oi^Of/y/N (2.4) 
where Oy- is the variance, and is given by 

^/=(]^E(/(-0-/f (2.5) 

Thus the error goes as o{N^^/^). 

The error bounds can be improved by sampling more points, or by reducing the vari- 
ance, Oy. The latter can be accomplished with importance sampling. Consider some prob- 
ability, P{x), that is an approximation to f{x). Write Eq. (2.2) as 

/■I fix) 

1= / P{xy-^dx. (2.6) 
The estimate of I is obtained by sampling points from P{x) and computing 

If P is a good approximation to /, then the variance of the sum in Eq. (2.7) is much less 
than the variance of the sum in Eq. (2.3). 

The fact that the integrands of interest are sharply peaked, as mentioned previously, 
makes importance sampling a necessity. The most useful type of importance sampling for 
these problems is the Metropolis method. 



2.2 Metropolis Sampling 

The Metropolis method (Metropolis et al, 1953) uses a Markov process to generate sam- 
ples from a normalized probability distribution, 7t(i?)/ / dR ii{R). These samples are then 
used to estimate Eq. (2.1) by 

0=l^LOiRi) (2.8) 

For generality in the following section, we will denote the state of the simulation by s. 

A Markov process takes a transition probability between states, fp(s — > *'), and con- 
structs a series of state points si,S2,--- (called a chain). An important characteristic of a 
Markov process is that the choice of the next state point in the chain depends only on the 
current state point, not any previous state points. 
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The Metropolis method constructs a transition probabiHty such that generated state 
points are sampled from the desired distribution. For this to work, the transition probability 
must satisfy ergodicity. This means the Markov chain must eventually be able to reach any 
state in the system. A sufficient condition for satisfying ergodicity is detailed balance, 

k{s)(P{s^s')^k{s')(P{s' ^s). (2.9) 

The generalized Metropolis method breaks the transition probability into the product 
of two pieces - an a priori sampling distribution r(s — > s') and an acceptance probability 
A(s — > s'). The Metropolis choice for the acceptance probability is 

A{s — > 5') = min 

The procedure is to sample a trial state, s', according to T{s ^ s') and evaluate Eq. 
(2.10). The acceptance probability is compared with a uniform random number on [0, 1]. 
If A is greater than the random number, the move is accepted, s' becomes the new s and is 
used in the average in Eq (2.8). Otherwise the move is rejected, s is not changed, and is 
reused in the average. 

The original Metropolis procedure chooses a trial position uniformly inside a box cen- 
tered around the current point, 

s' = s+y (2.11) 

where 3; is a uniform random number on [—A/2, A/2] , with A being an adjustable parameter. 
In this case, T is uniform and will cancel out of Eq. (2.10). 

An important measure of the Metropolis procedure is the acceptance ratio - the ratio 
of accepted moves to the number of trial moves. It can be adjusted by the choice of A. 
If the acceptance ratio is too small, state space will be explored very slowly because very 
few moves are accepted. If the acceptance ratio is high, it is likely that the trial moves 
are too small and once again, diffusion through state space will be very slow. Balancing 
these considerations leads to the standard rule of thumb that the optimal acceptance ratio is 
around 50%. 

A better consideration is maximization of the efficiency, 
where T is the computer time taken to get an error estimate of o. 
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(2.10) 
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2.3 Classical Monte Carlo 



The probability distribution we wish to sample is the Boltzmann distribution 



n{s)c<exp[-V{s)/kT]. 



(2.13) 



The first simulations of this type were done with the hard sphere potential (Metropolis 
et ah, 1953; Wood and Jacobson, 1957). Later simulations used Lennard-Jones potentials, 
and then other types of empirical potentials. 

The Metropolis procedure samples only the normalized 7t(s). Averages over this distri- 
bution are readily computed, but quantities that depend on the value of the normalization 
are difficult to compute. In classical systems, this includes quantities such the entropy and 
the free energy. There are techniques, however, for computing the free energy difference 
between two systems. 



Variational Monte Carlo (VMC) is based on evaluating the integral that arises from the 
variational principle. The variational principle states that the energy from applying the 
Hamiltonian to a trial wave function must be greater than or equal to the exact ground 
state energy. Typically the wave function is parameterized and then optimized with respect 
to those parameters to find the minimum energy (or minimum variance of the energy). 
Monte Carlo is needed because the wave function contains explicit two (or higher) particle 
correlations and this results in a non-factoring high dimensional integral. 
The energy is written as 



where Ej^ = and is called the local energy. Other diagonal matrix elements can be 
evaluated in a similar fashion. Off diagonal elements can also be evaluated, but with more 
effort. McMillan (1965) introduced the use of Metropolis sampling for evaluating this 
integral. 

A typical form of the variational wave function is a Jastrow factor (two body correla- 
tions) multiplied by two Slater determinants of one body orbitals. 



2.4 Variational Monte Carlo 



E = 



JdR\\fT{R)H\\fT{R) ^ JdR\\\fT(R)\^EL{R) 
JdR\VT{Rf JdR\\VT{R)\^ 



(2.14) 




(2.15) 
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The Jastrow factor, u, will contain electron-nucleus and electron-electron correlations. Ap- 
pendix B has details on the derivatives that enter into the local energy. 

As two electrons or an electron and a nucleus get close, there is a singularity in the 
Coulomb potential. That singularity needs to be canceled by kinetic energy terms in the 
wave function. This requirement is known as the cusp condition. Details are given in 
Appendix C. 

Techniques for the efficient handling of the determinants were developed by Ceperley 
et al. (1977). The VMC algorithm is implemented so that only single electron trial moves 
are proposed. This causes a change in only one column of the Slater matrix. The new 
determinant and its derivatives can be computed in O (N) operations, given the inverse of 
the old Slater matrix. This inverse is computed once at the beginning of the simulation and 
then updated whenever a trial move is accepted. The update takes 0{N^) operations. By 
comparison, computing the determinant directly takes 0{N^) operations. This technique 
creates a situation where there is more work done for an acceptance than for a rejection. A 
lower acceptance ratio will be faster, since fewer updates need to be performed. Details of 
the updating procedure and some other properties of determinants are given in Appendix A. 

Optimization of the parameters in the wave function is a large topic, so we will defer 
the discussion until a later chapter. However, we will make one observation here. If \\fT is 
an eigenstate, the local energy becomes constant and any MC estimate for the energy will 
have zero variance. This zero-variance principle allows searching for optimum parameters 
by minimizing the variance rather than minimizing the energy. In principle this is true for 
any eigenstate, not just the ground state. 

2.4.1 Two Level Sampling 

A multilevel sampling approach can be used to increase the efficiency of VMC (Dewing, 
2000). Multilevel sampling has been used extensively in path integral Monte Carlo (Ceper- 
ley, 1995). The general idea is to use a coarse approximation to the desired probability 
function for an initial accept/reject step. If it is accepted at this first level, a more accurate 
approximation is used, and another accept/reject step is made. This continues until the 
move is rejected or until the most detailed level has been reached. This method increases 
the speed of the calculation because the entire probability function need not be computed 
every time. 

Consider splitting the wave function into two factors - the single body part (D) and 
the two body part (e^). Treat the single body part as the first level, and the whole wave 
function as the second level. 
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First, a trial move, R', is proposed and accepted with probability 

^i=«^ii^[l:^J (2-16) 

If it is accepted at this stage, the two body part is computed and the trial move is accepted 
with probability 

exp[-2U{R') 



Ai = min 



1, 



(2.17) 



exp[-2f/(i?)] 

It can be verified by substitution that this satisfies detailed balance in Eq. (2.9). After 
an acceptance at this second level, the inverse Slater matrices are updated, as described 
previously. 

We compared the efficiency between the standard sampling method and the two level 
sampling method on two test systems: a single Li2 molecule in free space, and a collection 
of 32 H2 molecules in a periodic box of side 19.344 a.u. (r^ = 3.0). The wave functions are 
taken from Reynolds et al. (1982), and will be described in Section 2.7. 

The step size. A, is the obvious parameter to adjust in maximizing the efficiency. But 
we can also vary the number of steps between computations of El. The Metropolis method 
produces correlated state points (see more on serial correlations in Section 2.6), so succes- 
sive samples of El do not contain much new information. In these tests we sampled El 
every five steps. 

Results for the different sampling methods with Li2 are shown in Tables 2.1 and 2.2. 
The Determinant Time and Jastrow Time columns include only the time needed for com- 
puting the wave function ratio in the Metropolis method, and not the time for computing the 
local energy. The total time column does include the time for computing the local energy. 
The efficiency is also shown on the left in Figure 2.1. 

For the two level method with Li2, the second level acceptance ratio is quite high, 
indicating the single body part is a good approximation to the whole wave function. 

Results for the collection of H2 molecules are given in Tables 2.3 and 2.4. The effi- 
ciency is also shown on the right graph in Figure 2.1. 

Comparing the maximum efficiency for each sampling method, two level sampling is 
39% more efficient than standard sampling for Li2, and 72% more efficient for the collec- 
tion of H2's. 



2.5 Diffusion Monte Carlo 

Diffusion Monte Carlo (DMC) is a method for computing the ground state wave function. 
It typically takes an order of magnitude more computing time than VMC, and is most 
efficient when used in conjunction with a good VMC trial function. 
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Table 2.1: Timings for Li2 molecule using the standard sampling method. All times are in 
seconds on an SGI Origin 2000. 





Acceptance 


Determinant 


Jastrow 


Total 




A 


Ratio 


Time 


Time 


Time 




1.0 


0.610 


48.3 


340 


516 


1190 


1.5 


0.491 


48.1 


340 


508 


1680 


2.0 


0.407 


48.2 


340 


503 


1460 


2.5 


0.349 


48.2 


339 


499 


1070 


3.0 


0.307 


48.2 


339 


496 


800 



Table 2.2: Timings for Li2 molecule using the two level sampling method. All times are in 
seconds on an SGI Origin 2000. 



First Level Second Level Total Acc. 



A 


Acc. Ratio 


Acc. Ratio 


Ratio 


Time 




1.0 


0.674 


0.899 


0.606 


400 


1580 


1.5 


0.543 


0.894 


0.485 


347 


2430 


2.0 


0.447 


0.897 


0.401 


304 


2340 


2.5 


0.379 


0.902 


0.342 


276 


1910 


3.0 


0.331 


0.906 


0.300 


256 


1400 



Table 2.3: Timings for the system of 32 H2 molecules in a periodic box using the standard 
sampling method. All times are in seconds on a Sun Ultra 5. 





Acceptance 


Determinant 


Jastrow 


Total 




A 


Ratio 


Time 


Time 


Time 




2.0 


0.606 


167 


1089 


2015 


0.61 


3.0 


0.455 


167 


1085 


1891 


1.22 


4.0 


0.338 


166 


1084 


1794 


1.23 


5.0 


0.250 


166 


1080 


1722 


1.06 


6.0 


0.185 


164 


1080 


1668 


1.02 


7.0 


0.139 


162 


1084 


1629 


0.76 
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Table 2.4: Timings for a system of 32 H2 molecules in a periodic box using the two level 
sampling method. All times are in seconds on a Sun Ultra 5. 





First Level 


Second Level 


Total Acc. 


Total 




A 


Acc. Ratio 


Acc. Ratio 


Ratio 


Time 




2.0 


0.740 


0.795 


0.589 


1804 


0.59 


3.0 


0.598 


0.728 


0.436 


1421 


1.77 


4.0 


0.468 


0.681 


0.319 


1185 


2.11 


5.0 


0.357 


0.649 


0.232 


994 


1.55 


6.0 


0.370 


0.627 


0.169 


849 


1.87 


7.0 


0.204 


0.609 


0.124 


740 


1.46 



standard sampling 
two level sampling 




Figure 2.1: Efficiency of VMC. The graph on the left is for Li2. The graph on the right is 
for 32 H2 molecules. 
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Fonnally, DMC and related methods work by converting the differential form of the 
Schrodinger equation into an integral equation and solving that integral equation by stochas- 
tic methods. From another point of view, the Schrodinger equation in imaginary time and 
the diffusion equation are very similar, enabling one to use a random process to solve the 
imaginary time Schrodinger equation. This similarity was recognized early and was pro- 
posed as a computational scheme in the early days of computing (Metropolis and Ulam, 
1949). Unfortunately, without importance sampling, it is very inefficient computationally. 

The ground state wave function can be obtained by the projection 

(|)0 = lim e'^^"~^°^\\fT (2. 18) 

X — *oo 

where Eq is the ground state energy. This can be seen by expanding \|/r in energy eigen- 
states. 



^-T(£„-£o)(^^. (2.19) 



At large T, the contribution from the excited states will decay exponentially, and only the 
ground state will remain. To make a practical computation method, we write the projection 
in the position basis as 

\\f{R',t + x) = J dR\\f{R,t)G{R^R';x) (2.20) 

where G = {R'\e~'^^\R) and f{R.,t) is the wave function after some time t. This equation 
is iterated to get to the large time limit. The fully interacting, many-body Green's function 
is too hard to compute, so the various methods differ in how they approximate the full 
projector. In particular, DMC makes a short time approximation, and the resulting pieces 
have natural interpretations in terms of a diffusion process with branching. The name 
Projector Monte Carlo or Green's Function Monte Carlo is often applied to these methods. 
Perhaps unfortunately, the name Green's Function Monte Carlo (GFMC) is also applied to 
a specific technique that uses a spatial domain decomposition for the Green's function. 

For a more detailed presentation of DMC, with importance sampling, we mostly follow 
Reynolds et al. (1982). We start with the Schrodinger equation in imaginary time 

-^-^=[-XV' + V{R)-ET]mt) (2.21) 

where A, =H'^/2m. Importance sampling is added by multiplying Eq. (2.21) by a known trial 
function The result, written in terms of the "mixed distribution" f{R, t) — ?)\/r(i?), 
is 

^/(^'O ^ _;,vV+ {El{R) -ET)f + XV-{fFQ{R)) (2.22) 



dt 
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where El is the local energy, as in VMC, and Fq = 2V\^t/'^t (called the quantum force). 
Once again, the solution for / in terms of a Green's function is 

f{R', t + j dRf{R, t)G{R^ R';x) (2.23) 

For sufficiently short times, we can ignore the non-commutivity of the kinetic and potential 
terms in the Hamiltonian, e~^^ ~ e^^^e^^ . The explicit form for the short time Green's 
function in the position basis is 

G{R^R';x) = (47i;ix)-3^/2Gbranch(^^^';T)Gdrift(^^^';'c) (2.24) 

Gbranch(^^^';i;) = CXp [-% {E - Ej}] (2.25) 



GdnftiR^R';^) = exp -{R'-R-TixfQiR)}^ /4Xx 



(2.26) 



where E = [Et{R) +Et{R')] /2. 

The algorithm is started by generating a collection of configurations ("walkers"), usu- 
ally sampled from \^t- Equation (2.23) proceeds by applying a drifting random walk to 
each particle. The new position of the ith particle is given by 

r'i = ri + XxFq (R) + VtXz X (2.27) 

where % is a normally distributed random variable with zero mean and unit variance. In a 
simple interpretation of Eq. (2.23), this would always be the new position. But consider 
the case if becomes the true ground state, ^q. The branching term is then constant 
and the algorithm becomes similar to VMC. In this case we want to sample the correct 
distribution for any T. This is done by adding a Metropolis rejection step, where the trial 
move is accepted with probability 

\\fT{R'fG{R^R',xy 



mm 



1, 



(2.28) 



'yVTiR)^GiR'^R,x) 

Each configuration is then weighted by Gbranch- Because of rejections in the previous step, 
the time, x, in Eq. (2.25) should be replaced by Xgff, which is 

Xeff=%^x (2.29) 

where (^"accepted) mean square displacement of the accepted electron moves and (/"^otai) 
is the mean square displacement of all the proposed electron moves. 

The weighting is done by adding or removing configurations from the collection (branch- 
ing). This is done by computing the multiplicity M = int( Gbranch +y)^ where y is a random 
number on [0, 1]. This multiplicity is the number of copies of this configuration that should 
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be retained in the collection of walkers. If it is zero, the configuration is deleted from the 
collection. If it is one, the configuration remains as is. If it is greater than one, additional 
copies of this configuration are added to the collection. 

The number of walkers in the collection is kept roughly constant by adjusting Ej. In 
particular, the trial energy is adjusted according to 

Et^Eo + Kln{P*/P) (2.30) 

where Eq is the best guess for the ground state energy, P is the current population, P* is the 
desired population, and K is a feedback parameter. 

The energy is computed by averaging the local energy over the distribution of walkers. 
Once the transients have decayed away, subsequent steps are part of the ground state dis- 
tribution. The program is then run for however long is necessary to gather statistics for the 
energy and other estimators. 

There is a problem with DMC for estimating quantities other than the energy. The 
expectation value is not averaged over the ground state, but over the mixed distribution 
(|)o\|/r. This can be partly corrected by using the extrapolated estimator, 

((^o|A|^o) ~ 2(^o|A|\|/r) - (w|A|w) (2.31) 

Getting the correct estimator (also called a pure estimator) requires "forward walking", 
so named because the weight needed, (|)o/\|/r, is related to the asymptotic number of chil- 
dren of each walker (Liu et al, 1974). This can be implemented by storing the value of 
the estimator and propagating it forward with the walker for a given number of steps (Ca- 
suUeras and Boronat, 1995). 

2.5.1 Fermions 

In all these methods, some quantity is treated as a probability, which requires that it be 

I i2 

positive. In VMC this quantity is , which is always positive. For DMC, we sample 
from y\fT^o, which can be negative if the fermion nodes of are not the same as the nodes 
of ^Q. The simplest cure is to fix the nodes of the ground state to be the same as This 
is known as the fixed-node approximation. It is implemented in the DMC algorithm by 
rejecting moves that would change the sign of the determinant of 
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Figure 2.2: Examples of statistical data analysis using reblocking. The error in the graph 
on the left has converged, while the error in the graph on the right has not. 



2.6 Statistical Errors 



The formula for the variance given in Eq. (2.5) assumes that there are no serial correlations 
in the data. However, the Metropolis sampling method produces correlated data, which 
must be considered when estimating the statistical error. 

Correlations in data are quantified by the autocorrelation function, defined for some 
estimator, E, as 

= I - ^) ^^i+k - ^) (2.32) 

The autocorrelation time, K, is computed as 

2 cutoff 

K = 1 + — £ C{k) (2.33) 
^ k=l 

This sum tends to be quite noisy. As a heuristic strategy, we can approximate K by the first 
place where the autocorrelation function drops below 10%. The true variance of the mean 
is the simple variance of the individual data points multiplied by K. 

Another way to estimate the true error is by reblocking (Flyvbjerg and Petersen, 1989; 
Nightingale, 1999). At the second level, take the average of every 2 data points. Now com- 
pute the variance of this set of data that has A'/2 points. Continue this procedure recursively 
until the variance stops changing. Nightingale (1999) gives a well-defined procedure for 
computing when that occurs. Figure 2.2 shows some example plots of error vs. reblocking 
level. On the left hand graph we see the expected plateau in the error estimate. On the right 
hand graph there is no plateau, indicating that there is not enough data to reliably estimate 
the error. 
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2.7 Wave Functions 



For our studies of molecular hydrogen, we started with the wave function \|/ni from Reynolds 
(Reynolds et al, 1982). The Jastrow factors are 



(2.34) 



where Z is the nuclear charge and b is the variational parameter. The cusp conditions are 
satisfied by setting = 1 and = 1 /2. As noted in Appendix C, having the correct cusp 
condition for parallel spins does not affect the energy much, so the same value for ag is used 
for parallel and antiparallel electron spins. The b's from the two types of Jastrow factors 
are folded into a single parameter, ^ = a/b^. 

The orbitals are floating Gaussians, with the form 



(|)/(r) =exp 



2 

wf 



(2.35) 



where c/ is the center of orbital, and wi is a free parameter. In molecular hydrogen, c/ will 
be fixed at the bond center. 

The orbitals can be generalized to be anisotropic, 

^/(r) = exp [-(r - df ■ R^FR • (r - c/)] (2.36) 

where F is a diagonal tensor and R is a rotation matrix. There are two parameters - the 
width along the bond direction (rotated so as to be the z-axis), and the width perpendicular 
to the bond direction. The elements of F are defined to be (1 /w^, 1 /w^, 1 /wj). 

Finally, additional energy reduction was found for the isolated H2 molecule by multi- 
plying the orbital by (1 + ^ |r — c/|), where ^ is a variational parameter. 



2.8 Periodic Boundary Conditions 

The effects of an infinite system can be approximated by imposing periodic boundary con- 
ditions on a finite system. Every particle in the system then has an infinite number of im- 
ages. Inter-particle distances are calculated using the minimum image convention, which 
uses only the distance to the closest image. 

Care needs to be taken with the wave function when using the minimum image conven- 
tion. As the inter-particle distance crosses over from one image to another there can be a 
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discontinuity in the derivative of the wave function, leading to a delta function spike in the 
energy. If this is not accounted for, the VMC energy can become lower than the true ground 
state because this delta function term in the energy has been neglected. Additionally, the 
Gaussian orbitals can lower their energy by having a width comparable to or larger than the 
box size. Then sections of the orbital with large kinetic energy are outside L/2, and do not 
get counted in the integral. This can be fixed by summing over images, or by insuring the 
wave functions have the correct behavior at ±L/2. We use the latter solution. 

The orbitals are multiplied by a cutoff function that ensures its value and first derivative 
are zero at the box edge. The function we use is 

Mr) = 1 - exp [-Y,(r-r«)2] (2.37) 

where is fixed at L/2 and jc is a variational parameter. 

The Jastrow factors are constructed so that they obey the correct cusp conditions as 
r — > and so that the first and second derivatives are zero at < L/2. The simplest 
function that satisfies these conditions is a cubic polynomial. Let y = r/rm- Then 

u{y) — a\y -\- a2y^ -\- tt'iy^ , (2.38) 

where ai = (cusp value) * r^, 02 = — «i> and 03 = ai/3. Variational freedom is gained 
by varying r^, and by adding a general function multiplied by y^{y —\)^ to preserve the 
boundary conditions. We choose a sum of Chebyshev polynomials as the general function 
(Williamson et al, 1996). The full Jastrow factor is then 

u{y) = -/ + \y') +y\y - l?Y.^iTi{2y - 1), (2.39) 

where and the bi are variational parameters. We use five Chebyshev polynomials for the 
electron-electron part and another five for the electron-nuclear part. We optimized one set 
of rm and bj parameters for all electron-electron pairs in any particular system, and another 
set of parameters for all electron-nuclear pairs. 

Comparisons of the energy and variance of various combinations of forms for the orbital 
and Jastrow factors are shown in Table 2.5. The variational parameters are given in Tables 
2.6 and 2.7. A comparison of the electron-electron Jastrow factors is shown in Figure 2.3. 
Their short range behavior is similar, but the long range behavior differs between the types 
of Jastrow factors. 

The quality of wave functions is often measured by the percent of correlation energy 
recovered. For H2, the HF (no correlation) energy is —1.1336 Hartrees and the exact (full 
correlation) energy is —1.17447 Ha. Sun et al. (1989) compared a number of forms for 
electron correlation functions. Their best value recovered 80% of the correlation energy. 
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Figure 2.3: Optimized electron-electron Jastrow factor for different forms. 

Table 2.5: Comparison of energies and variances for various forms for orbitals and Jastrow 
factors for a single H2 molecule. 





Orbital 


Jastrow 


Energy 


Variance 


%CE 


A 


Isotropic 


simple Pade 


-1.1598(4) 


0.046 


64.0(9) 


B 


Anisotropic 


simple Pade 


-1.1643(2) 


0.040 


75.0(6) 


C 


Anisotropic + ^ 


simple Pade 


-1.1653(2) 


0.033 


77.5(5) 


D 


Anisotropic + ^ 


cubic 


-1.1688(2) 


0.039 


86.1(6) 


E 


Anisotropic + ^ 


cubic+Chebyshev 


-1.1702(2) 


0.046 


89.6(5) 



Using one of these forms, but with better optimization, Huang et al. (1990) recovered 84% 
of the correlation energy. Snajdr et al. (1999) obtained 93% of the correlation energy using 
a Linear Combination of Atomic Orbitals (LCAO) form with Is, 2s and 2p orbitals, and 
using the Jastrow factors of Schmidt and Moskowitz (1990). 

The variance is higher with those wave functions involving the cubic polynomial, even 
though the energy is lower. I believe this is mostly likely because the cubic polynomial 
does not have the correct 1 /r behavior at large r, but the simple Pade form does. This 
long range behavior contributes little to the average of the energy, but it contributes more 
significantly to the variance. 
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Table 2.6: Values of variational parameters for H2. 





Orbital parameters 


Jastrow parameters 


A 


w = 2.74 




p = 9.913 


B 


Wxy = 2.514, = 2.977 




p = 10.002 


C 


Wxy = 2.416, = 2.833, ^ = 


0.0445 


p = 9.958 


D 


Wxy = 2.357, = 2.628, C, = 


0.248 


e-e rm = 5.404 
e-n r,n = 5.376 



Table 2.7: Values of variational parameters for wave function E 



Component 


Parameters 


Orbital 


Wxy 


= 2.299 


Wz = 


= 2.515 


c 


= 0.301 


Electron-electron Jastrow 




= 6.281 


bQ-- 


= -1.012 


bi 


= 0.193 




bi 


= 0.619 


b3 = 


= 0.025 


b4 


= 0.138 


Electron-nuclear Jastrow 




= 5.329 


bo = 


= -2.084 


bi 


= 0.153 




bi 


= 0.952 


b3-- 


= 1.217 


b4 


= 1.027 
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Chapter 3 

Energy Difference Methods 



Very often it is the difference in energy between two systems that is of interest, and not 
the absolute energy of a single system. For a quantity such as the binding energy, we want 
the difference between the energy of the molecule and the energy of the free atoms. In our 
CEIMC simulations, we want the change in energy from moving a few nuclei. In VMC 
optimization, we want to know the change in energy from modifying some of the wave 
function parameters. 

Correlated sampling methods can provide a more efficient approach to computing these 
energy differences. But the widely used reweighting method has some drawbacks. We 
will introduce a new method that alleviates some of the drawbacks of reweighting while 
retaining its advantages. 



The simplest, and most straightforward way of computing the difference in energy between 
two systems is to perform independent computations of the energy for each system. Then 
the energy difference and error estimate are simply 



This method is simple and robust, but has the drawback that the error is related to the 
error in computing a single system. If the systems are very similar, either in variational 
parameters or in nuclear positions, the energy difference is likely to be small and difficult 
to resolve, since Oi and 02 are determined by the entire system. Similarities between the 
systems can be exploited with correlated sampling. 



3.1 Direct Difference 



AE = E1—E2 



(3.1) 




(3.2) 
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3.2 Reweighting 



Reweighting is the simplest correlated sampling method. 



AB = E1—E2 



(3.3) 



An estimate of AE for a finite simulation is 

1 

N 



AE 



E 



EniRi)- 



w{Ri)ELi{Ri) ' 
Liw{Ri) 



(3.4) 



where w = \|/2/v|/i. The same set of sample points is used for evaluating both terms. 

Reweighting works well when and \|/2 are not too different, and thus have large 
overlap. As the overlap between them decreases, reweighting gets worse due to large fluc- 
tuations in the weights. This effect can be quantified by computing the effective number of 
points appearing in the sum in Eq. (3.4), which is 



A'eff = 



(3.5) 



Eventually, one or a few large weights will come to dominate the sum, and the effective 
number of points will be very small, and the variance in AE will be very large. 

Particularly pernicious is the case when the nodes differ between the two systems. The 
denominator of the weight can easily be very small, causing a very large weight value. This 
is encountered when using reweighting to optimize orbital parameters in VMC (Bamett 
etal, 1997). 

In Eq. (3.4) we derived reweighting by drawing points from and computing the 
properties of both systems from them. It could also be derived by drawing points from \|/2 
as well. We can compute the energy difference both ways and take the average. This gives 
us the symmetrized reweighting method. 



AE = 



1 



2N ^ 



ELl{Ri) 



W,{Ri)EL2{Rd 



+ 



- E 

2N ^ 



Wy{R,)EM 



Etim 



(3.6) 



where Wx = ^\/^\ and Wy 



¥i/¥2- 
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3.3 Bennett's Method for Free Energy Differences 



First let us digress to discuss computation of the normalization integral. It was mentioned 
earlier that the Metropolis sampling method makes it difficult to extract information about 
the normalization integral, which the partition function in the classical case. Bennett (1976) 
demonstrated a method for finding the free energy difference between two systems. We will 
describe his method in terms of a ratio of normalizations. 

We can compute the ratio of two normalizations, Qi and Q2, in a fashion very similar 
to reweighting. 

Q1/Q2 = j dRyif\{R) I j dRy^l{R) (3.7) 
'^^¥2^^/ JdRyviiR) (3.8) 

This is a one-sided estimate, because it only uses samples from system two to compute 
properties of system one. Note that this sum is the same as the sum of the weights used in 
reweighting in Eq. (3.4). 

Bennett improved on this one-sided estimate, starting with an identity written as 

QiJdRYiWy^i 

where W is an arbitrary weight function. He found the optimum W by minimizing the 
variance of the free energy difference, 

Woe (3.11) 

Let Q = Q1/Q2. Inserting Eq. (3.11) into Eq. (3.10), we get 

1 = ^ (3.12) 

JdR^y,/{Q^^,l + ^Vj) 

Let X represent the configurations sampled from and y the configurations sampled from 
\|/2. The finite sample version of this equation is 



(3.13) 



22 



The value of Q can be found by a simple iteration 



Qn+l = Qn 



¥t(>')+e«v2(>v) 



(3.14) 



The iteration is started with 2o = 1 and stopped when the correction factor in brackets is 
sufficiently close to one. Typically convergence takes less that ten iterations, but if Q is 
much larger or smaller than one it can take more iterations. 

We have written these formulas assuming that the number of sample points from each 
system is the same. Bennett derived them for case with differing numbers of samples in 
each sum, and found the best variance was usually very near an equal ratio of computer 
time spent on each system. In our case the systems are of equal complexity, so this means 
using equal numbers of points is optimal, or very nearly so. 

By properly combining information from both systems, we can get a much better (lower 
variance) estimate of the ratio of their normalizations than if we had used information from 
only a single system (one-sided sampling). 



3.4 Two-Sided Sampling 

We can apply this notion to computing the energy difference between two quantum sys- 
tems. Consider sampling from some distribution that contains information about both 
and \|/2. The simplest such distribution is 



1 

^=2 



Qi Qi 



The energy difference can be written as 

jdRyiflEn j dRyiflEu 



AE 



Qi Qi 
= I dRP 



(3.15) 



In the finite case, we have 

It is important to note that the sum covers samples taken from both and \\r2. The sum 
includes both "direct" terms (eg. and En evaluated at configurations sampled from \\fi) 
and "cross" terms (eg. V|/i and Eli evaluated at configurations sampled from \\f2). 
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The denominator of the first term in Eq. (3.17) is 

(3.18) 



The ratio Q = Qi/Qi^^ computed by the Bennett method. The denominator of the second 
term can be computed similarly. 

One major feature of the two-sided method is that it reproduces reweighting in the 
large overlap regime, and the direct method in the low overlap regime. In the intermediate 
regime, it smoothly joins the two limits. 

To show this, first consider the case where the two systems are very different and the 
wave functions have low overlap. Here '^\{yi) and '^2i^i) ^^^^ small. Expand Eq. (3.17) 
into its four terms 

1 ^ y^\{xi)ELi{xi) 1 ^ y^l{yi)EL2{yi) 

^ OAT 2^ 1 r.„2/^A I r^.„2^^^^ omZ^ 1 



direct 



^ V ' 

cross 

Each denominator will have one large term (v|/f (jc,) or \|^2(>'i)) ^^'^ small term 
or \^2i-^i)). The value of Q is moderate compared to the wave functions, so it will not affect 
the relative sizes of these terms. Always having one large term in the denominator means 
there will never be any excessively large contributions to the sum resulting from division 
by a small value, as happens in reweighting. The cross terms have a small value (^f ()^) or 
'^2ix)) in the numerator, and so vanish. The large terms in the denominators in the direct 
terms cancel the \|/^'s in the numerator, and we are left with 

^ ~ ^ I^^Li (xi) - ^ I^L2(y/) (3.20) 

X y 

which is just the direct method. 

Now for the case where the systems are very similar and have large overlap. Recall 
from Eq. (3.9) that we can write one-sided estimates for Q as 

Q = ll.Myi) = ^/iL''-(^i) (3.21) 

where Wy = '^\{yi) M\{yi) and Wx = vll-^O Write the four terms of Eq. (3.17) in a 

different order 

}_y En (Xj) Qy Wx{Xi)EL2{Xi) 

N^l+Qwx{x,) l+Qwx{xi) 

1 y Wy{yi)En{yi) ly EL2{yi) .^22) 
QN^ \+Wy{yi)/Q N^\+Wy{yi)/Q 



AE = 



+ 
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Approximate the denominator of each term by two, replace the leading Q's with the appro- 
priate one-sided approximation, and we get 



AE 



+ 



-E 

2N^ 



Eli {xi) 



W^{Xi)EL2{xi 



JjUxWxiXi) 



Wy{yi)ELi{yi) 
h^xWyiyi) 



■Eiiiyi) 



(3.23) 



which is the symmetric version of reweighting given in Eq. (3.6). 

Due to computational considerations, it is useful to divide Eqns (3.13) and (3.17) by 
or \|/2 as appropriate, and work with the resulting ratios Wx = y^\/y^\ and Wy = \^\/\^\, as 
was done in Eq (3.22). The values of the wave functions can easily over or under flow dou- 
ble precision variables. It is best to use the log of the wave function, take differences, and 
then exponentiate. Furthermore, an arbitrary normalization of the wave functions makes 
no physical difference, but can result in very large or small numbers, even after taking the 
difference of the logarithms. This problem is ameliorated by subtracting the average value 
of the log of the wave function from the individual values. Sometimes this is not enough, 
however, and the value of the energy difference exceeds the range representable in a double 
precision variable, indicated by NaN (Not a Number). In this case, the overlap is clearly 
very small and the two-sided method should give the same results as the direct method. 
The program checks for the energy difference being NaN, and if so, it substitutes the di- 
rect method result (the data collected for the two-sided method is a superset of that needed 
for the direct method). Having done this, the subroutine computing the two-sided energy 
difference will always return a reasonable answer, an important consideration for a core 
routine in a program. 



3.5 Examples 

The first example is of two H2 molecules in a parallel orientation as shown in Figure 3.1. 
The bond lengths are at equilibrium, 1.4 Bohr, and the starting separation between the 
molecules is J = 2.5 Bohr. 

The energy difference between that configuration and configurations with other inter- 
molecular distances was computed using the direct method, the two-sided method, and 
reweighting. The resulting energy differences are shown on the left in Figure 3.2. Note 
that reweighting gets the wrong answer at large separations. This is most likely due to a 
finite sample size bias. More important is the error in that energy difference, shown on the 
right in Figure 3.2. Note that both reweighting and the two-sided method have errors that 
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Figure 3.1: Two H2 molecules in a parallel configuration 




-2 -1 1 2 3 4 -2 -1 1 2 3 4 

Move distance Move distance 



Figure 3.2: Energy difference (left) and the estimated statistical error (on logscale) (right) 
for two H2 molecules in a parallel configuration, starting from d=2.5 Bohr. 

drop to zero as the overlap increases. This graph also clearly shows the properties of the 
two-sided method mentioned previously, behaving like reweighting at small changes in the 
separation (large wave function overlap), and smoothly crossing over to the direct method 
for at large changes in the separation (small wave function overlap). 

Reweighting and the two-sided method may give biased results because there are a 
finite number of sample points in the sums in Eqns. (3.6) and (3.17). To test for this, a sum 
of a given length is repeated many times and the average energy difference for that length 
computed. The test for a bias was performed on a Li2 molecule. The energy difference 
was computed between a bond length of 4.5 Bohr and the equilibrium bond length of 5.05 
Bohr. Figure 3.3 shows the results for different numbers of points in the sum. Reweighting 
shows a much larger finite sample size bias than the two-sided method, which has almost 
none. 
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Figure 3.3: Finite sample size bias in the energy difference of Li2. 
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3.5.1 Diffusion Monte Carlo 

Using the two-sided method (or reweighting, for that matter) with DMC is sUghtly more 
compUcated. The reweighting transformation appUed to the basic DMC iteration gives 

h{R!;t + x) = j dRf{R-t)G^{R^R'-x) (3.24) 

= /.«/(«V)G.(.^«';x)||§^ (3.25) 

The weight w = G\/G2 must be computed over several iterations. The final weight used in 
the correlated sampling formulas is a product of the weights of every iteration. 

The weight factor is not quite right, due to the rejection step. Since the rejection ratio for 
DMC is very small (< 1%), ignoring the issue should not introduce a large error. Umrigar 
and Filippi (2000) give a more sophisticated method for dealing with rejections. 

The fixed-node condition also has to be obeyed, and configurations that cross a node 
while projecting have their weight set to zero. 

A version of reweighting was implemented by Wells (1985) as the differential Green's 
function Monte Carlo method (actually DMC). He used the response to an external field to 
determine the dipole moment of LiH. The same trial function was used, so the drift term 
was the same between both systems. Only the branching term was different; that entered 
as a weight factor. In our case, the trial function and the nuclear positions may be different 
between the two systems. 

The top of Figure 3.4 shows the difference in DMC energies using the various methods. 
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The energy difference was computed starting from the equilibrium bond length of 5.05 
Bohr. Partly because of the need to project for several DMC steps, the two-sided method 
has a fairly small range where it does better than the direct method (compared with the 
range for VMC). For comparison, the VMC results are shown at the bottom of Figure 3.4. 
There are two lines in the DMC graph for the direct method. The implementation had a 
limitation where only one projection to accumulate the weights would occur at a time. We 
used 30 steps in the projection, and consequently could only get the weights once every 30 
steps. The limited data line is computed from data collected once every 30 steps (the same 
amount of data available to the correlated methods) and is the line the two- side method 
joins on to. The full data line used all the data available in the simulation and so has a 
lower statistical error. 

3.5.2 Binding Energy 

To compute the binding energy, let the non-interacting system be \|/2, and the fully interact- 
ing system be The nuclear positions are the same for both systems, and the appropriate 
interaction terms are set to zero for the non-interacting system. 

A pair of H2 molecules in a parallel configuration was used as the test system. The 
binding energy we are interested in is that of the interacting molecules minus separate H2 
molecules (and not the separate atoms). 



There is a problem in that the electrons in the fully interacting system can switch 
molecules and have no effect on the computation, but these configurations are very unlikely 
in the non-interacting system. This leads to an artificially small overlap. The solution in 
this symmetric case is to restrict the domain of integration. The electron coordinates are 
ordered along the x-axis so that xi < X2 and X3 < X4. 

To see that this restriction is exact, consider the integral 



where / corresponds to the electron-electron Jastrow factor and g is symmetric under the 
interchange of xi and X2 and corresponds to the electron-nucleus Jastrow factor and the 
square of the Slater determinant. Change variables to R= {xi +X2)/2 and r = xi—X2. 
Now we have 



The integral over r is even, and we only need to integrate over half of the interval (r < or 
r > 0), which corresponds to the restrictions xi < ^2 or jci > X2. 



Eb=E{{H2)2)-2E{H2) 



(3.26) 




(3.27) 




(3.28) 
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Figure 3.4: Error in energy difference of Li2 using DMC (top) and VMC (bottom) 
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Figure 3.5: Error in VMC binding energy of H2-H2 system 

Figure 3.5 shows the error in the VMC binding energy for various intermolecular dis- 
tances. Without restricting the domain of integration, reweighting performs poorly, and the 
two-sided method reproduces the results of the direct method. With the restricted domain, 
the correlated methods perform quite well. 
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Chapter 4 

Wave Function Optimization 



Variational Monte Cario (VMC) depends crucially on the optimization of parameters in the 
wave function to find the minimum energy. The general problem of function optimization 
is a well-studied area. For a general introduction to various optimization techniques, see 
Press et al. (1992). For more in-depth work, consult Polak (1997), Dennis and Schnabel 
(1983), or Ortega and Rheinboldt (1970). 

The main difficultly in applying these techniques to optimizing VMC wave functions 
is noise - we only get stochastic estimates for function values or gradients. Glynn (1986) 
describes several strategies for optimization in the presence of noise. We will divide these 
into three categories. 

The first strategy is to convert the problem into a nearby smooth, non-noisy problem, 
and solve that problem instead. Fixed sample reweighting takes this approach by sampling 
some set of configurations and optimizing with just these configurations. 

The second approach is to reduce the noise to negligible levels, and proceed with reg- 
ular optimization techniques. This is possible with a Newton method, where the first and 
second derivatives of the function are computed, and the number of iterations needed for 
convergence hopefully is small. 

The third approach is to use a method tailored to handle noise. The Stochastic Gradient 
Approximation (SGA) is such a method. Also somewhat in this category, we will examine 
a method that is essentially a biased random walk, and the moves are accepted or rejected 
based on the whether or not the energy decreases. 

These approaches will be compared on several problems of different sizes to see how 
they scale. We will use a single H2 molecule, and collections of 8,16, and 32 H2 molecules 
in a box as trial problems. 
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4.1 Energy vs. Variance Minimization 



There is a choice of objective functions - either the energy or the variance of the energy 
can be minimized. Under certain circumstances, variance minimization is more stable 
that energy minimization. For the reweighting method, this is definitely true, but it may 
not be the case for the other optimization techniques. It is generally held that variance 
optimization would produce better values for observables other than energy (Williamson 
et al, 1996), but this may not always be the case (Snajdr et ai, 1999; Snajdr and Rothstein, 
2000). The argument is that the variance is more sensitive to parts of the wave function 
that do not contribute to the energy. As we have seen in Chapter 2, having incorrect long 
range behavior in the H2 wave function does not affect the energy much, but does cause 
the variance to rise. In other words, variance minimization should yield a "smoother" wave 
function, which should then have better non-energy properties. 



4.2 Fixed Sample Reweighting 

Fixed sample reweighting with minimization of the variance was popularized by Umrigar 
et al. (1988), and has been used extensively since then. The current state of the art is 
described by Kent (Kent et al., 1999; Kent, 1999). 

The core of the method is the single sided reweighting method described in Chapter 
3. A number of configurations are sampled from a distribution with variational parameters 
ao. The energy at an arbitrary value of the variational parameter, a, is computed by 

£(a) =£w(y?,-;a)£L(i?/;a)/£w(7?i;a) (4.1) 

i i 

where w{Ri; a) = ^^{Ri; a) /\\f^{Ri; ao). Alternatively, one could compute the variance by 
A(a) = Y,w{Ri;a){EL{Ri;a) -Ejf /Y,w{Ri;a) (4.2) 

i i 

where Et can either be the weighted average energy (4.1) or it could be a guess at the 
desired energy. 

The weights in these expressions can get very large when the variational parameters 
move far from the sampled value ao, and especially when the parameters that affect the 
nodes are adjusted. Then just a few configurations will dominate the sum, and the energy 
estimator can often give meaningless low values. The variance estimator, however, will 
remain more stable in this situation. For either estimator, the best fix is to regenerate the 
configurations being used when the parameters move too far away from ao. This can be 
used in conjunction with the enhancements described below. 
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A second advantage of the variance estimator is that the weights can be modified with- 
out changing the location of the minimum (Kent et ah, 1999). The same is not true for the 
energy. The problem of a few large weights can be solved by limiting them to a maximum 
value (Filippi and Umrigar, 1996), or more simply by just setting them all to one (Schmidt 
and Moskowitz, 1990). Bamett et al. (1997) tame the fluctuating weights by sampling 
from a positive definite guiding function. In this cases, Ej should be set to a best guess, or 
slightly below, because the energy estimator will not be reliable. 

Further increases in stability can be gained by limiting outliers in Eq. (4.2). Large 
outlying values have a disproportionately large effect on the variance, but their contribution 
is not that meaningful. Kent et al. (1999) gives a procedure choosing a cutoff that will 
reduce its effect as the number of samples increases. We used a simpler approach, removing 
from the sum any values greater than 5 standard deviations from the average. 

Another efficiency improvement can be exploited when the Jastrow U is linear in the 
variational parameters. Then the variational parameters can be factored out of the sum over 
interparticle distances, and the value of that sum can be stored. Fixed sample reweighting 
has been applied mostly to optimizing Jastrow factors, and not parameters in the Slater 
determinant, so this results in a dramatic time savings. In our case we have both Jastrow 
and determinantal parameters, and the code spends about 40% of its time computing the 
Jastrow factor. This percentage will decrease as the system size increases, since the Jastrow 
computation is 0{N^) but the determinantal part requires matrix work that is 0{N^). We 
did not implement this improvement, so bear in mind when perusing the results that the 
reweighting time could be reduced, probably by 30%. 

An additional advantage of reweighting is that, since it is solving a smooth problem, 
an off-the-shelf minimizer can be used. We used the DSMNF general minimizer from 
the PORT library, which uses only the function values and does not need any derivatives. 
Routines to minimize sums of squares are also available, but we did not try them. The 
fixed sample reweighting algorithm is then: Generate a set of configurations and minimize 
the variance with this set. Generate a new set of configurations using the new variational 
parameters and find the minimum variance again. Repeat for several steps to ensure con- 
vergence. 

4.3 Newton Method 

The Newton method makes use of the first and second derivatives. We can approximate a 
function near its minimum as a quadratic surface 

/(x) ft. /(xo) + (x - xo) ^ ■ b + (x - xo) ^ ■ A ■ (x - xo) (4.3) 
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where bi = ^ and A,-,- = is the Hessian matrix. The location of the minimum is then 
given by 

xo = x-A-^-b (4.4) 

Since we are likely to start in a region where / is not quadratic, this step is iterated several 
times. 

This procedure, along with analytic evaluation of the derivatives, was applied to VMC 
energy minimization by Lin et al. (2000). Analytic derivatives of the local energy with 
respect to determinantal parameters are given by Bueckert et al. (1992), but these were 
used in the context of a reweighting minimization. 

Recall the VMC energy is computed by 

E={El) = J dR\\f^{a)EL{a) j J dR\\r^{a) (4.5) 

We want the derivatives of E with respect to various variational parameters, a. These could 
be computed with finite differences and reweighting, but it is better to do some analytical 
work on this expression first. 

Lin et al. (2000) use some Green's relations to eliminate explicit derivatives of the local 
energy (Ceperley et al., 1977), and derive the following expression for the gradient, 

dE 
dam 

where 



= 2[(£iVi„,J-(£i)(x|/|„,J] (4.6) 
, 3ln\|/ 1 



They also give the expression for the Hessian, 

= 2{{ELVi,m,n)-{EL)«,m,n) 

+2 [{El - (^l) (¥(„,«<„)] 



where 



and 



dan ^"'"'"'aa^ 
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Computing the first derivatives of the wave function analytically is relatively easy. 
Computing the second derivatives with respect to parameters in the Jastrow factor is also 
easy analytically. For parameters that appear in the determinant, however, second deriva- 
tives are more difficult. For this reason we compute most of the first derivatives analytically, 
and use these to compute the second derivatives with a simple finite difference scheme. The 
first derivative of the local energy was computed with finite differences. The derivative of 
the orbital cutoff parameter, which is the same for all the orbitals, was also computed with 
finite differences. 

An advantage of the Newton approach over the gradient-only approaches is that it has 
information about how big of step should be taken, whereas the step size is a parameter that 
must be tuned in the gradient-only methods. The drawback, though, is a greater sensitivity 
to noise. The gradient and Hessian must be sufficiently accurate, or the Newton iteration 
will get wildly wrong results. More precisely, it is the non-linear process of taking the 
inverse in Eq. (4.4) that causes the problem. Furthermore, this sensitivity to noise increases 
with the number of parameters. 

Another problem is parameter degeneracy, or near degeneracy. This will make the 
Hessian singular, or nearly so. Even if it not exactly singular, being nearly singular is the 
equivalent of dividing by a small number, which will also greatly magnify the effects of 
noise. The usual solution is use of the Singular Value Decomposition (SVD). See Press 
et al. (1992) or Kincaid and Cheney (1991) for a description of the algorithm. A more 
detailed look at "regularization" (of which the SVD is one method) is in Hansen (1998). 
The SVD starts by decomposing a matrix as 

A = PDQ (4.11) 

where P and Q are unitary matrices and D is a diagonal matrix. The elements of D are the 
eigenvalues of A^A. For our square, symmetric matrix, these are are the squares of the 
eigenvalues of A. We can also take P and Q to be the eigenvectors of A. The utility of the 
SVD is seen when we write the inverse of A as 

A"i=Q^D-ip^ (4.12) 

If A is singular, then at least one of its eigenvalues is zero. In this case, zero eigenvalues 
also indicate parameter degeneracy, so it's not really necessary to move in the directions 
corresponding to the zero eigenvalues. To avoid moving in these these directions, and to 
stabilize the inverse, set 1 /di in Eq. (4.12) to zero when di is smaller than some cutoff. 

With the eigenvalue decomposition we have an additional technique - negative eigen- 
values correspond to uphill directions and mean we are at a saddle point or are far from a 
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Figure 4.1: Examples using the Newton iteration with varying amounts of noise. 



region where the quadratic approximation is good. The simplest way of handling this is 
to ignore negative eigenvalues. So we remove small positive and all negative eigenvalues 
when solving Eq. (4.4). 

Some examples of this Newton iteration with 8 H2 molecules are shown in Figure 4.1. 
Ns is the number of samples used in computing the gradient and Hessian. Unless otherwise 
noted, the SVD method for solving Eq. (4.4) was used with removal of eigenvalues less 
than 0.01. Other runs without using regularization are not shown because they diverge very 
drastically. 

4.4 Stochastic Gradient Approximation 

The Stochastic Gradient Approximation (SGA) was designed by Robbins and Munro ( 1 95 1 ) 
to handle optimization with noisy gradients. It was first applied to VMC optimization by 
Harju etal. (1997). 

The SGA iteration can be written as 

ai = a,_i - /iY/Va£(a,-i). (4.13) 

where his a step size parameter and ji is some specially chosen series. 

^Much of the complexity of current Newton and quasi-Newton optimization methods is in deciding how to 
move the parameters when the quadratic approximation is not good. Typically it involves a line minimization 
in the gradient direction or some sort of back tracking. 
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There are some conditions on that must be satisfied in order for this iteration to 
converge. They are 



Y/>0 (4.14) 

oo 

J^Y, = oo (4.15) 

i=l 

OO 

L^<- (4.16) 

(=1 

The condition given by Eq. (4.15) allows the iteration to reach anywhere in parameter 
space. The condition in Eq. (4.16) is needed so the effects of noise will eventually be 
damped out. An obvious choice for y, is 1/z. For more discussion on these conditions and 
for some conditions on the objective function, see Young (1976) and Tsypkin (1971). 

We can analyze the convergence in the limiting case of no noise in one dimension. First 
let us make a continuous version of Eq. (4.13) by letting y(?) — dt/t and da — (l{t) — a{t — 
dt). Then in the dt ^ limit, the SGA iteration is 

^ = -^Va/(a(0) (4.17) 

Now let us assume that / has a quadratic form, /(a) = ^Aa^ +Ba + fo, with a mini- 
mum at a = -B/A. Now Eq. (4.17) is 

^ = -^[Aa+B] (4.18) 
at t 

The solution is 

a{t) = -B/A + aor^ (4. 19) 

where ao is a constant of integration. So we see that it will converge to the solution at 
? — i> oo, with a rate that is controlled by the curvature of the potential and our choice of h. 

Now consider generalizing to the case where y{t) — dt/t^. Our continuous equation is 
then 

^ = -^[Aa+5] (4.20) 



The solution is 

.1-8 



(4.21) 



a(t) = -B/A + aoexp -M?'"7(l-6) 

We see that the smaller 6 is, the faster the convergence. If there were no noise, 5 = would 
indeed be the best choice. 

Now let us represent the noise in the gradient with an additive noise term, r\{t). Then 
Eq. (4.20) is 

da h , , 

— = -^[Aa + 5 + Ti(0] (4.22) 
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Previously we considered case where noise was negligible. Now consider the case where 
the noise dominates, so Eq. (4.22) becomes 



The solution is the integral 



^ = _!lW (4 23) 

a{t) = - dt-^ (4.24) 

To look at convergence, we need to compute the variance of a integrated over the noise. 
Take the noise to have a probability distribution P{x, t) with zero mean and variance o. The 
variance of a is then 

dxj dtP{x,t)-^ (4.25) 
If we take P{x,t) to have no dependence on t, the integrals factor and we get 

= -T^^ir (4.26) 

Here we see that larger values of 5 lead to faster convergence of the noise. Since smaller 
values of 5 lead to faster convergence of the non-noisy problem, we need an intermediate 
value of 5 to balance these effects. 

One variation, suggested by Nemirovksy and Yudi (1983), is to use 5=1/2 and use the 
cumulative average of the variational parameters. This value of 5 violates the condition in 
Eq. (4.16), but this condition is there to insure the noisy part converges. Instead we use the 
cumulative averaging process to remove the noise. 

Another acceleration technique involves monitoring the sign of the gradient (Tsypkin, 
1971). Far from the minimum the gradient will not often change sign between successive 
steps. Close to the minimum, the noise will eventually dominate, and the gradient will 
change sign more often. The acceleration procedure is to only update y, when the sign of 
the gradient changes. This also has the advantage of adjusting the convergence of each 
parameter separately. 

In practice, starting the series at yi = 1 tends to make the first steps have a dramatically 
larger effect on the parameters than subsequent steps. Often, the first few steps would move 
the parameters very far from the minimum, and then the iteration will take a long time to 
converge. In this work we started the series at ? = 10 to minimize this effect. 

We tried several of these SGA variants on the box of 8 H2 molecules. We used h = 3 
when yi = \/\fi and h = \0 when y/ = 1 / i- This way the initial step sizes (give by hyt) were 
similar. Figure 4.2 shows the convergence of one of the variational parameters (r^ for the 
electron-electron Jastrow). The convergence of the energy is also shown. We see that the 
two accelerated methods converge faster than the simple SGA. 
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Figure 4.2: Examples of SGA. The graph on the top shows the convergence of one varia- 
tional parameter for several SGA algorithms. The graph on the bottom shows the resulting 
energy. 
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4.5 Gradient Biased Random Walk 



We introduce a new method that is made possible by the two-sided energy difference 
method in Chapter 3. Using this, it is relatively easy to determine whether a change of 
the trial parameters lowers the energy or not. This determination can be fitted onto a num- 
ber of methods, even a random walk. We evaluate the gradient and make a trial move in the 
gradient direction, similar to the SGA. Unlike the SGA, the move is accepted only if it low- 
ers the energy. Since the gradient is noisy, we are effectively making a random walk that is 
biased in the gradient direction, hence the name Gradient Biased Random Walk (GBRW). 
The trial move is 

ar = a;_ i - /i (a/_ i ) . (4.27) 

where h is randomly chosen from [0,/jniax]- To provide some simple adaptivity, /zmax is 
adjusted during the run. If a trial move is rejected, /zmax is decreased via multiplication by 
some factor, usually 0.5 or 0.6. If a trial move is accepted, it is increased by multiplying by 
the reciprocal of that same factor. 

Currently the level of convergence of this method is controlled by how well the energy 
difference is computed. In other words, once the energy differences are of the same size 
as the estimated error, it simply fluctuates. There are several possibilities for making a 
convergent method. The first is to take the cumulative average of the parameters, or add 
a damping parameter as in the SGA. The second is to increase the number of samples to 
compute the energy difference (and so decrease the noise) at each iteration. 

4.6 Comparison of methods 

We test the various optimization methods and compare their run times. The test systems 
are an isolated H2 molecule, and 8,16, and 32 molecules in a box at = 3.0, a fairly 
low density. Each system has 12 parameters in the Jastrow factor, and 3 determinantal 
parameters per molecule, plus one more for the box cutoff (which is the same for all the 
orbitals). Thus we have 15, 37, 61, and 109 variational parameters, respectively. For the 
starting parameters, we set the Jastrow cutoff to = 4.0, the orbital widths to 2.0, the 
orbital box cutoff to 1.0, and all the other parameters to zero. 

The Newton method used the regularization method with a cutoff of 0.01 for AT = 8, 16 
and a cutoff of 0.1 for = 32. No regularization was used for = 1. The SGA method 
used yi = \/\fi and parameter averaging. Reweighting used 16000 configurations for = 1 
and 1000 configurations for A^ = 8 and 16. We did not attempt reweighting on the largest 
system. 
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Figure 4.3: Optimization methods applied to (a) Single H2 (b) 8 H2's (c) 16 H2's (d) 32 

H2'S 



The best way to compare these methods would be to run them all many times starting 
from different random number seeds. The average of the resulting distribution would give 
the average quality of each method, and the spread of the distribution would indicate the 
stability. However, this is time-consuming and instead, as a first approximation we present 
the results for a single run of each method in Figure 4.3. The times are in hours on an AMD 
Duron 600 Mhz (which is approximately 1/2 to 2/3 the speed of a 195 Mhz RIOOOO in an 
SGI Origin). 

For the single molecule, it is clear that the Newton method is the best method. The 
reweighting method also performs well, and the two gradient methods take longer to con- 
verge. As the system size increases, however, the gradient methods do better, with the SGA 
method doing the best. 

The Newton method in particular has difficulty with stability as the system size in- 
creases. It needs to be run long enough so the noise is small enough that it does not affect 
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the results. 

The reweighting method performs surprisingly poorly on the larger systems. Looking 
more closely at the results of reweighting for the N = ^ case, we get a total energy of 
-9.244(2) Ha and a variance of 0.30. From the SGA we get E = -9.275(3) Ha and a 
variance of 0.42. From the GBRW we get E — —9.268(2) Ha and a variance of 0.36. It 
appears that the problem is with the variance minimization and we have a case where the 
minimum variance solution is not the lowest energy solution. Although on the scale of the 
total energy, the difference between reweighting and the SGA is only 0.3%. On the scale 
of the correlation energy in the isolated molecule, this difference is about 10%. 

4.7 Future Work 

We have compared a few basic methods for VMC parameter optimization. Many more 
improvements and modifications could be conceived and tried. 

Currently we ran these with set numbers of iterations and numbers of samples, then 
looked at the results, and perhaps made adjustments and tried again. What would be very 
helpful is some sort of adaptivity - adjusting the number of samples or even the type of 
method as the optimization proceeds in order to ensure convergence. 

So far the gradient-only methods seem to have the advantage, but have the disadvan- 
tage that they require a step size be set manually. In order to generate a trial step size 
automatically, a secant updating method could be tried, where successive gradient evalu- 
ations are used to build up an approximate inverse Hessian (Dennis and Schnabel, 1983). 
These methods are often superior to using the actual Hessian (Press et ah, 1992) , but it is 
not clear how the presence of noise will affect the algorithm. 

Finally, it would be instructive to perform these comparisons on systems containing 
atoms with higher atomic number. 
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Chapter 5 

Coupled Simulation Methods 



There are several issues we have to deal with when constructing an efficient CEIMC simula- 
tion. The first is noise from the QMC evaluation of the energy. We will discuss a modifica- 
tion to the Metropolis acceptance ratio, called the penalty method, that will accommodate 
noise. Next we will examine some of the details involved in a CEIMC simulation, and 
finally give results for a single H2 molecule. 

5.1 Penalty Method 

The Metropolis acceptance ratio, from Chapter 2, is min [l,exp(— A)], where A = — 
. The QMC simulation will yield a noisy estimate for A, which we denote as 5 
The exponential in the acceptance ratio is nonlinear, so that (exp(— 5)) 7^ exp((— 6)). 
The noise will introduce a bias into our acceptance ratio formula. To avoid this bias in our 
simulations, we can either run until the noise is negligible, or we can try find a method that 
tolerates noise. 

Typical energy differences for moves in our simulations are on the order of .01 — .05 
Ha. If we want an error level of 10% (statistical error of .001 Ha) ^ it would take about 7 
hours of computer time for a system of 16 H2 molecules. We need to perform hundreds 
of these steps as part of the classical simulation, so clearly a method that could tolerate 
higher noise levels would be very beneficial. The penalty method of Ceperley and Dewing 
(1999) does this, and our simulations run with noise levels on the order of .01 Ha, which 
only takes about 4 minutes of computer time. 

In the penalty method, we start with detailed balance, written as 

A{s^s')=A{s' ^s)exp[-A]. (5.1) 
'The usual error level considered chemical accuracy is 1 kcal/mol = .0016 Ha 
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To deal with noise, we would like to satisfy detailed balance on average, We introduce 
an instantaneous acceptance probability, a (5), that is a function of the estimated energy 
difference. The average acceptance probability is the instantaneous one averaged over the 
noise, 

/oo 
ddP{d;s ^ s')a{d) 
-oo 

The detailed balance equation we would like to satisfy is then 



(5.2) 



/oo p -| 

ddP{d;s ^ s') a{d)-e-\{-d) =0 
-oo L 



Suppose the noise is normally distributed with variance, a. Then 

(5-A)2n 



P(5) = (20^71) -V2exp 
A simple solution to Eq. (5.3) is 



2o2 



(5.3) 



(5.4) 



a (5) = min 



l,exp(-5-y) 



(5.5) 



The extra term causes addition rejections of trial moves due to noise. For this reason 

it is called the penalty method. 

To verify that the solution in Eq. (5.5) satisfies detailed balance (5.3), let us compute 
the average acceptance probability 

1 



A(A) = 



V2a^K J- 
1 



j5,-(5-A)V2aVn 1,.-S-V2 

-0V2 



J6e-(S-^)V2- + 



1 



J5 e-(5-A)V2a2^-5-aV2 



-o2/2-A 



1 



V2o^ Ja^/l-A 



= ^erfc((oV2 + A) jlo^) + ^e" W((oV2 - A) jlcs^) 

where we have made the substitutions 5' = 5 — A and 5" = 5' + a^. This expression for 
A (A) will satisfy detailed balance, A (A) = e^A(-A). 

In practice, both the energy difference and the error are being estimated from a finite 
set of data. Assume we have n estimates for the energy difference, ji, Estimates for 
the mean and variance are given by 



1 " 



1 
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(5.6) 
(5.7) 



and we have A = (5) and = (x^). 

The average acceptance ratio can be written as integral over 5 and y!}. The probabiUty 
distribution for the estimated error is a chi-squared distribution. An asymptotic solution can 
be formed by expanding a(5, and performing the integrals to get the average acceptance 
ratio. This is set equal to a power series for exp(— a^/2), and by matching powers of a we 
get the coefficients for the original series 

fora(5,x2). This series can by summed to get a 
Bessel function, hence we call it the Bessel acceptance formula. It is convenient to expand 
the log of the Bessel acceptance formula in powers of jn. The Bessel acceptance formula 
is then 

a(5,%^,n) = min[l,exp(— 5 — Ms)] (5.8) 

where 

„ I , x'(5n + 7) 

^ 2 4(n+l) 3(n+l)(n + 3) 8(n + 5)(n + 3)(n + l)2)^ ^^'^^ 

Note that as n gets large, only the first term is important, which is just the regular penalty 
method. 



5.1.1 Other methods 

There is another method for handling noise, originally proposed by Kennedy, Kuti, and 
Bhanot (Kennedy and Kuti, 1985; Bhanot and Kennedy, 1985), that uses a power series 
expansion of exp[— 5] to construct an unbiased acceptance ratio. It has an advantage over 
the penalty method in that it does not assume any particular distribution for the noise. The 
method has a major drawback in that it depends on the value of 5 not becoming too large, 
and not just the error estimate for 5. This could severely restrict the maximum steps sizes 
for moving the nuclei in our simulations. Methods for dealing with this restriction has 
recently been addressed by Lin et al. (1999) and Bakeyev and de Forcrand (2000), but we 
did not explore these extensions. 



5.1.2 Handling noisy data 

Using noisy data requires care in handling. Particularly, inappropriate reuse of any single 
estimated value can lead to biased results. For instance, in a classical simulation the en- 
ergy difference would be computed, and one of the two energies involved would be used 
in accumulating the average energy. See the top of Figure 5.1 for an outline of such a sim- 
ulation. However, this leads to a bias when noisy energies are involved. This can be seen 
by considering a negative fluctuation in the energy of the trial move. This will make the 
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energy difference smaller (or more negative) and hence more likely to be accepted. Thus 
the negative fluctuations would be preferentially added to the accumulated average, and 
bias the result downward. 

This program outline could be corrected by computing a new value for the energy in 
the average. However, there is another arrangement that is more amenable to the energy 
difference methods of Chapter 3. The computation of the energy used in the average is the 
same quantity needed for the old energy in the next iteration. So that computation can be 
moved to the next iteration, as shown on the bottom in Figure 5.1. 

Several points are illustrated with a system of two particle interacting via a Lennard- 
Jones potential with £ 0. 1 and a —1.5^. The temperature of the system was 3 160K (P = 
100). Noise was simulated by adding a Gaussian random variable with known variance 
to every energy computation. Results for several algorithms versus noise level are shown 
in Figure 5.2. The top curve shows the bias that results from having no penalty method. 
The middle curve is the correct method, which we see is independent of the noise level. 
The bottom curve demonstrates the bias from reusing the energy involved in making the 
accept/reject decision. 

The noise level of a system can be characterized by the relative noise parameter, / = 
{^a)^t/to, where t is the computer time spent reducing the noise, and to is the computer 
time spent on other pursuits, such as optimizing the VMC wavefunction or equilibrating 
the DMC runs. A small / means little time is being spent on reducing noise, where a large 
/ means much time is being spent reducing noise. The efficiency of a CEIMC simulation 
can be written in terms of this parameter. Our paper gives an example of a double well 
potential, and finds the noise level that gives the maximum efficiency. Generally it falls 
around Pa = 1 — 2, with the optimial noise level increasing as the relative noise parameter 
increases. The one exception occurs when computing the first moment, which is sensitive to 
crossing the barrier between the double wells. These crossings are assisted by an increased 
noise level, hence the optimal noise level is much higher. 

5.2 Pre-rejection 

We can apply multi-level sampling ideas (see section 2.4.1 for an application to VMC) to 
our CEIMC simulations as well. The idea is to use an empirical potential to "pre-reject" 
moves that would cause particles to overlap and be rejected anyway. 

In this case, the trial move is proposed and accepted/rejected based on a classical po- 

•^Here a is the length scale for the LJ potential 
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Compute old Energy 



loop over Classical steps 

loop over Number of molecules 

make trial move (translation and rotation of H2 molecule) 



Compute trial Energy 



acceptance probability = min [l,exp (— PAB — (Pa)^/2)] 

accept/reject trial move 

if accept, set old Energy = trial Energy 
Use updated old Energy in average ^ 

end loop 
end loop 



loop over Classical steps 

loop over Number of molecules 

make trial move (translation and rotation of H2 molecule) 



Compute old Energy Compute trial Energy 



acceptance probability = min [l,exp (— PAB — (Pa)^/2)] 

accept/reject trial move 
Use old Energy in average ^ 

end loop 
end loop 



Figure 5.1: CEIMC program outlines. Boxes indicate quantum computations. The dashed 
box indicates a quantity saved from a previous computation. The top algorithm is incorrect. 
The bottom algorithm is correct. 
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Figure 5.2: Examples on a Lennard- Jones potential with synthetic noise. 



4.5 



tential 



Ai = min 



■exp(-|3AK/) 



(5.10) 



T{R' ^R) 

where AVd = Yd {R') — Vci (R) and T is the sampling probability for a move. If it is accepted 
at this first level, the QMC energy difference is computed and accepted with probability 



A2 = min [1 , exp(-pA VgMC - "s) exp(PAVc/)] 



(5.11) 



where ub is noise penalty. 

Compared to the cost of evaluating the QMC energy difference, computing the classical 
energy difference is free. Reducing the number of QMC energy difference evaluations is 
valuable in reducing the computer time required. 

In Chapter 7, using the pre-rejection technique with a CEIMC-DMC simulation results 
in a first level (classical potential) acceptance ratio of 0.43, and a second level (quantum 
potential) acceptance ratio of 0.52. The penalty method rejects additional trial moves be- 
cause of noise. If these rejections are counted as acceptances (ie, no penalty method or no 
noise), then the second level acceptance ratio would be 0.71. The classical potential is a 
fairly good representation for the DMC potential, and we can use that to reduce the number 
of DMC energy difference evaluations needed. 

5.3 Trial Moves 



Molecular moves are separated into translation, rotation and bond length changes. 
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Table 5.1: Efficiency of classical Monte Carlo for moving several particles at once. The 
table on the left is for low density system at rs = 3.0 and T=5000K. The table on the right 
is for a high density system at = 1.8 and T=3000K. The largest values of the efficiency 
are shown in boxes. 
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The Silvera-Goldman potential is used as the empirical potential for pre-rejecting trans- 
lational moves. Anisotropic potentials were tried for pre-rejecting rotational moves, but 
they did not work very well. It is not clear whether this was from the the potentials being 
derived for isolated H2-H2 interaction, or from inaccuracy in the trial wave function. 

Bond stretching moves were pre-rejected using the, essentially exact, H2 intramolecular 
potential of Kolos and Wolniewicz (1964). The new bond length is sampled uniformly from 
a box of size At, around the current position. Because of phase space factors we need to 
include a sampling probability of T{R) — l/R^ in the acceptance formula. 

The trial move for classical Monte Carlo is usually presented as either moving one 

particle at a time, or all of the particles at once. However, we can move other numbers of 

particles as well. Table 5.1 shows the efficiency for a classical system with 32 H2 molecules 

for two densities and temperatures. On the left is a low density system with = 3.0 and 

at a temperature of 5000K, and on the right higher density system with = 1.8 and a 

temperature of 3000K. For the lower density system, the highest efficiency occurs when 

moving half the molecules at a time. Relatively high efficiency can also be found moving 

2, 4 or 8 at a time as well. For the higher density system, the most efficient regime shifts 

towards smaller step sizes and fewer number of particles moved at a time. ^ 

^These results are not generally applicable to classical MC simulations, since much more efficient imple- 
mentations are possible for systems interacting with a two body potential. 
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Table 5.2: Results of CEIMC for isolated H2 molecule at T=5000K. 
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Figure 5.3: H2 bond length distribution. 

5.4 Single H2 molecule 

The CEIMC method was applied to a single H2 molecular in free space, at a temperature 
of 5000K. Exact results are obtained by integrating the potential of Kolos and Wolniewicz 
(1964). Results for the energy, pressure, and first and second moments of the bond length 
are given in Table 5.2. The Virial column is computed by 

Virial = [2(is:) + ('^^)] (5.12) 

This is related by the virial theorem to the force on the nuclei, which should be zero for an 
isolated molecule. In Chapter 7, we will see this expression used to compute the pressure. 

As we would expect, the VMC energy is higher than the exact energy. The other quan- 
tities are close to their expected values. Histograms of the bond length distribution are 
shown in Figure 5.3. Here again, both VMC and DMC reproduce the exact distribution 
well. 
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Chapter 6 
Hard Spheres 



A system of hard spheres is the simplest non-ideal many body system to study. Statis- 
tical Monte Carlo techniques and molecular dynamics were first applied to hard spheres 
(Metropolis et ah, 1953; Alder and Wainwright, 1957). Additionally, some of the first ap- 
plications of field theoretic methods to condensed matter systems were on hard spheres. 
More recently, the achievement of BEC in trapped atomic gases has renewed interest in the 
theory of the hard sphere Bose gas (Dalfovo et ah, 1999). 

This chapter is a VMC and DMC study of a homogeneous, boson hard sphere fluid. 
Because they are bosons, there is no fixed node approximation in DMC, and the result can 
be made essentially exact. The approximations we must control are finite size effects and 
timestep error in DMC. 

Applying perturbation theory to the Bose hard sphere gas yields the low density expan- 
sion, 

£: = 27tp[l+Ci^ + C2plnp + C3p + ...] (6.1) 

where Ci = 128/(15V7t) and C2 = 8(47i/3 - v^). Mean field theory gives the linear term. 
Straight forward application of perturbation theory yields the C\ term, as was done by 
Huang, Yang, and Lee (Lee et al, 1957; Huang and Yang, 1957). The next higher or- 
der of perturbation theory diverges, but this was solved by including the depletion of the 
condensate by Beliaev (1958) to get C2. Wu (1959) obtained the same results via a resum- 
mation technique. Hugenholtz and Pines (1959) also obtained the logarithmic term. They 
also obtained the functional form for the series, which includes terms of the form p"/^ and 
p"/^log(p). 

Renormalization group techniques have recently been applied to examine this diver- 
gence (Castellani et al, 1997; Braaten and Nieto, 1997a,b). In addition, Braaten and Nieto 
have calculated C3 (Braaten and Nieto, 1997a,b). This term is also first term that depends 
on more that the two body 5- wave scattering length. It would require a solution to the three 
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body scattering problem, which makes an expUcit computation of that coupUng constant 
difficult. 

Hansen et al. (1971) used VMC on an 864 particle system to find the fluid-solid tran- 
sition density. Kales et al. (1974) used the more accurate Green's Function Monte Carlo 
(GFMC) to calculate the energy of the solid and liquid phase near freezing to determine 
the freezing density. They used 256 particles. In the liquid state they computed four points 
ranging in density from 0.16 to 0.27. 

Recently Giorgini et al. (1999) did DMC calculations on the homogeneous Bose gas, 
with various potentials, including the hard sphere potential. They used 500 particles and 
there was no mention of what DMC time step was used. These calculations were also used 
to make a fitting to the extended form of Eq. (6. 1) (using terms up to p^/^) by Boronat et al. 
(2000). 

There have been other attempts to get an equation of state by using various fitting tech- 
niques to combine the low density results and the GFMC results (Aguilera-Navarro et al, 
1987; Keller et al, 1996). As noted by Keller et al. (1996), the earlier work had an error 
and used only half the energy of the actual GFMC results. A new attempt at fitting the 
various functional forms was not done, but a new value for C3 was estimated. 

The Hamiltonian for this system is 

^=-^Ev2 + £v(|r,-r,-|) (6.2) 
^ i i<j 

where 

V r = <^ ^ (6.3) 
We have set ^ = m = a = 1 in all these calculations. 



6.1 Wave function 

An approximate wave function for the boson ground state that we use for a trial function is 

V = n/(^'7) (6-4) 

The individual functions are very similar to the ones used for hydrogen. The correlation 
function has a maximum range, r^ax, beyond which / is constant. In order to be compatible 
with periodic boundary conditions, we require r^ax < L/2. The "cusp" condition is that 
the wave function must vanish linearly when two spheres get close, /(r — > a) »: (r — a). 
(Unlike the electronic case, the slope is not fixed.) 
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Table 6.1: Variational parameters for hard sphere gas 
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A change of variables will simplify these expressions. Let x = r — a, Xmca = ^max — ^ 
and y = x/xmax- Now y lies in the range [0, 1]. In these variables, the boundary conditions 
on / are 

/(y = o) = 

/(y=i) = 1 (6.5) 
/(y=i) = /"(y=i) = o (6.6) 



(6.7) 



and the wave function is 



f{y)-^{y-y^)+i+y{y-'^?t.biTi{2y-i) (6.8) 

where are Chebyshev polynomials, and bi are variational parameters. 

The parameters for each density are given in Table 6.1. They were obtained from op- 
timization of the smallest system (A^ ~ 40). Then those same parameters were used for all 
system sizes at a particular density. 

The cost of computing of the wave function and local energy is dominated by calculat- 
ing the A' (A' — l)/2 interparticle distances. There are techniques for improving the scaling 
of computations of short range interactions to achieve o{N). We used the cell method 
(Allen and Tildesley, 1987). The simulation box is divided into cubic cells and a list is 
made of all the particles in each cell. For simplicity, consider the case where each cell is 
larger than the cutoff distance, rmax- Then a particle in a cell will have a non-zero inter- 
action only with the particles in the same cell and with particles in the neighboring cells. 
Particles in cells further away can be ignored. There is an overhead in computing and 
maintaining these lists. We used the cell method on systems with 500 particles and larger. 

There is a deficiency with the trial wave function that leads to undersampling when three 
particles are in close proximity. In DMC, this leads to a large number of branching walkers 
to compensate for the undersampling, which invariably causes problems with maintaining 
a stable population. One solution is to use a guiding function, which differs from the trial 
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Figure 6.1: Time step error for (a) p = 0.01 (b) p = 0.05 (c) p = 0.1 (d) p = 0.2 

wave function, for the diffusion and branching. Then a weight, \|/r/w> is associated with 
each sample point. In this case the simplest guiding function is to use is \|/g = \|/". We 
found that a = 0.9 was sufficient to make the population of walkers stable. 

The DMC timestep errors should be local, and hence the same for all system sizes. 
We did timestep extrapolation on systems with N = 40 particles. The timestep errors were 
found to be linear in T. The extrapolations to T = are shown in Figure 6.1. 

Green's Function Monte Carlo (GFMC) uses a different decomposition of the Green's 
function that DMC. The principle advantage of GFMC is that is has no time step error. We 
ran GFMC at p = .05 and = 40 to verify the timestep errors. The GFMC data point is 
shown at T = in Figure 6.1. The importance sampling in GFMC is not as effective as in 
DMC, hence the variance is larger, and GFMC is less efficient than DMC. GFMC has an 
efficiency of about 7, whereas DMC has an efficiency of 240 at T = .002 and an efficiency 
of 570 at T = .007. Even with computing at several timesteps to extrapolate to zero, DMC 
is more efficient that GFMC. 
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Figure 6.2: S(k) for (a) p = 0.05 (b) p = 0.2 

6.2 Finite Size Effects 



The main contribution to finite size effects in the energy is the long wavelength phonons. 
The functional form for their contribution depends on the small k behavior of the structure 
factor, S{k). The energy can be written as 



E^An k^dk e{k) 



(6.9) 



where kt, = In/L is the small k cutoff due to the finite box size. The energy of the phonon 
excitations at small k is proportional to S{k) (Feynman and Cohen, 1956). For a classical 
liquid, S{k)°^ 1 + (k^) . For a Bose fluid, S{k) should be proportional to k and S{k — > 0) = 
0. 

The VMC wave function has no long range part, and so we expect it to behave like a 
classical fluid at small k. Integrating Eq (6.9), we get E<xkl, which is the same as scaling by 
1/A^ for a fixed density. A more rigorous derivation of this scaling is given by Lebowitz and 
Percus (1961). The DMC algorithm should pick up the correct long wavelength behavior, 
leading to an S{k) that is linear in k. Integrating Eq. (6.9) we get E <x k^. This then gives 
us l/N^/^ scaling. 

The small k behavior for S(k) can be seen nicely for p = 0.05, shown in Figure 6.2a. The 
graph shows that the VMC structure factor appears quadratic as expected. The DMC mixed 
estimator shows the S{k) behaving linearly, but still not headed to zero. The extrapolated 
estimator looks like it over corrects and lowers S(k) too much. 

We did calculations for systems with 40, 108, 256, and 500 particles. For p = .2, 
additional VMC runs with N = 10^ and = 10"^ particles were done, and they are shown 
on the graph. The VMC energy nicely fits the 1 /N behavior, as shown in Figure 6.3. 
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Figure 6.3: VMC finite size effects for (a) p = 0.01 (b) p = 0.05 (c) p = 0.1 (d) p = 0.2 
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Figure 6.4: DMC finite size effects for (a) p = 0.01 (b) p = 0.05 (c) p = 0.1 (d) p = 0.2 
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Table 6.2: Energy extrapolated to infinite system size (in units of 



p 


VMC 


DMC 


Giorgini a/. (1999) 


.2 


6.0546(6) 


5.67(1) 




.1 


1.8744(4) 


1.809(1) 


1.8130(35) 


.05 


0.6917(1) 


0.6690(4) 


0.6690(5) 


.01 


9.144(2) X 10-2 


8.9896(8) X 10-2 


8.980(5) X 10-2 



The DMC finite size extrapolation is shown in Figure 6.4. At higher densities, the DMC 
energy does not appear to have a 1 /N^l^ dependence (or even \/N dependence). However, 
the data is too sparse and noisy to make a good determination as to what the functional 
form should be, so we fit it to 1 /N^l^. The final infinite system results are given in Table 
6.2. 

The long wavelength excitations will take a long time to sample, and their effect on 
the energy (and S{k)) may only be apparent with very long runs. And the time needed 
to sample them will increase with box size. The larger box sizes may be insufficiently 
converged, causing the energy to be too high. This may explain the apparent curvature in 
the wrong direction. 

There are several approaches for resolving the problem at larger box sizes. The first 
is simply to perform even longer runs to see if the energy drops. Similarly, data for more 
system sizes would be helpful in outlining the functional form of the finite size dependence. 
Finally, explicit long range correlations could be added to the wave function, of the form 
proposed by Chester and Reatto (1966). 

The energy versus density is shown in Figure 6.5, relative to the first order term , which 
is linear in the density. The low density exansion up to the C\ term is also shown, as well 
as up to C3, using the fitted value of 73.296 (Keller et ah, 1996). It is clear from the graph, 
and was noted by Hugenholtz and Pines (1959), that these additional terms by themselves 
do not help the expansion. 

Boronat et al. (2000) treated C2 and C3 as adjustable parameters and added two addi- 
tional terms, p^/^\og{x) and p^/^ They get a very good fit to the high density data, as seen 
in Figure 6.5. 

The results of Giorgini et al. (1999) are also given in Table 6.2 for the three common 
densities computed. Their data and ours agree within the error bars. 
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Figure 6.5: Energy vs. density 
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6.3 Distribution Functions and Condensate Fraction 



The two particle correlation function for all densities was calculated with a system size of 
256 particles. The results for g(r) using the extrapolated estimator are shown in Figure 6.6. 
We see the liquid shell structure developing as the density increases. 

The single particle density matrix is the projection of a many-body wave function on to 
a single particle space. It is defined by 

pl(r,r') = j dr2...drN\\f{r,r2,...,rN)\\f*ir',r2,...,rN) (6.10) 

In the homogeneous case, pi only depends on the distance between r and r'. The large 
r behavior of pi (or the ^ = behavior of its Fourier transform, n{k)) is the condensate 
fraction. At zero temperature, the many body wave function is in the ground state. Because 
of interations, not all the particles are in the single body zero momentum state (^ = plane 
wave state in this case). 

We used a method for sampling Pi (r) which was given by McMillan (1965). The con- 
densate fraction was obtained by integrating the single particle density matrix for distances 
greater than some cutoff, r^, chosen to be where Pi{r) had reached a plateau. 

The condensate fraction is given in Table 6.3 and shown in Figure 6.8. Also shown in 
the figure is the GFMC result of no = 0.095(1) at p = 0.2. The low density expansion is 
given by 



«o = 1 



8 



.1/2 



(6.11) 
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Table 6.3: Condensate fraction 



K 


VMC 






.2 


0.1009(5) 


0.0876(3) 


0.0743(8) 


.1 


0.307(2) 


0.2960(5) 


0.285(2) 


.05 


0.563(2) 


0.5401(4) 


0.517(8) 


.01 


0.834(1) 


0.826(1) 


0.818(2) 
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Figure 6.8: Condensate fraction vs. density 



0.15 



0.2 



Similar to their treatment of the energy, Boronat et al. (2000) added two additional terms, p 
and p^/^. Their fit does a good job at higher densities, where, as expected, the low density 
expansion misses the full extent of the depletion. 
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Chapter 7 
Hydrogen 



Hydrogen has been the subject of many experimental and theoretical studies. Theoretically, 
its simple electronic structure make it a favorable first target for various methods. Exper- 
imentally, hydrogen has been compressed by shock waves and also with a diamond anvil 
cell. We will present some CEIMC simulations and compare the results with those from 
one of the gas gun shock wave experiments. 

7.1 Experiment 

The high pressure experiments fall into two categories - transient compression from a shock 
wave or static compression from a diamond anvil cell. The shock wave experiments reach 
higher temperatures and pressures, but obtain more limited data. A high- velocity projectile 
hits a stationary target, inducing a shock wave in the target. The target is analyzed by 
the Hugoniot relations, derived by treating the shock wave as an ideal discontinuity and 
applying conservation of mass, momentum, and energy across it (Zel'dovich and Raizer, 
1966). The relations are then 

P-Pq = PoUsUp (7.1) 
P = PoUs/{us-Up) (7.2) 

E-Eo = i(yo-y)(P+Po) (7.3) 

where Eq, Pq, Vq, and po are the initial energy, pressure, volume, and density, respectively. 
The velocity of the shock wave is Us and Up is the velocity of the projectile driving the 
shock. 

There are a number of methods for accelerating a projectile (Cable, 1970), but the two 
most prominent methods for hydrogen targets are the two stage light gas gun (Nellis et al. 
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1983; Holmes et al, 1995; Weir et al, 1996; Nellis et al, 1999) and a large laser (Silva 
et al, 1997; Collins et al, 1998; Celliers et al., 2000). 

Recent advances make it possible to measure the temperature by light emission from 
the samples during compression (Holmes etal, 1995). Measurement of the conductivity is 
also possible, used in recent experiments to detect metallic hydrogen (Weir et al, 1996). 

The diamond anvil cell (DAC) is used to generate large static pressures. It has been 
used to study the fluid phase and several solid phases (Mao and Hemley, 1994). It has also 
been used to determine the melting curve for hydrogen up to 500K (Diatschenko et al., 
1985; Datchi a/., 2000). 

7.2 Theory 

Free energy models are typically based on the chemical picture, where molecules, atoms 
and various types of ions are all treated as different species of particles. Solving the phys- 
ical picture, where the only fundamental particles are electrons and protons, is much more 
difficult (see Path Integral Monte Carlo below). The free energy of the various phases is 
constructed from a variety of fits to experimental data, equation of state data from refer- 
ence systems (Lennard- Jones and hard sphere), and empirical and theoretical interaction 
potentials. 

One of the best known models is that of Saumon and Chabrier (Saumon and Chabrier, 
1991, 1992). Extensive tables for astrophysical use were published by Saumon et al. 
(1995). Another model was developed by Kitamura and Ichimaru (1998), to study the 
plasma (metal-insulator) transition. 

The Path Integral Monte Carlo (PIMC) method is in principle the best method for sim- 
ulations, since it treats both the electrons and protons quantum mechanically at non-zero 
temperature (Pierleoni et al., 1994; Magro et al., 1996; Militzer and Ceperley, 2000; Mil- 
itzer, 2000). The only major uncontrolled approximation is the location of the electron 
nodes, with problems and a solution similar to the fixed node method in DMC. Militzer 
and Pollock (2000) have made progress in improving the nodal structure used in these cal- 
culations. PIMC is based on breaking up a thermal density matrix into a product of high 
temperature components, and consequently it works well at high temperature and becomes 
less efficient as the temperature decreases. About 5000K is currently the lower limit for 
PIMC calculations. Our CEIMC simulation technique should make a nice complement to 
the PIMC method. 

There have also been path integral studies using empirical potentials, in order to exam- 
ine the quantum effects of the nuclei on the system (Wang et al., 1996, 1997; Cui et al. 
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1997; Chakravarty, 1999). 

The Car-Parrinello method has been used to simulate this system (Hohl et al., 1993; 
Kohanoffet al., 1991 \ Pfafenzeller and Hohl, 1997; Galli etal, 2000). At low temperature, 
it it necessary to treat the nuclei with path integrals (Biermann et al, 1998; Kitamura et al, 
2000). Some studies used LDA with the T point approximation (using only one fc-point 
for the integral over the Brillouin zone), which is not sufficient to converge the anisotropic 
behavior of the potential (Mazin and Cohen, 1995), and gives rise to unphysical planar 
structures (Kohanoff et al, 1997). 



7.3 Pressure and Kinetic Energy 



The pressure is computed by a virial estimator based on the potential and kinetic energies 

P = 1-[1{K) + {<P)] (7.4) 

where V is the volume and V is the potential energy. In these MC simulations, only the 
kinetic energy of the electrons is explicitly computed. The kinetic energy of the nuclei must 
also be added. 

We are only considering hydrogen in the molecular state, and further assume that rota- 
tional and vibrational motion can be separated. The characteristic temperature for quantum 
effects for rotational motion is about 85 K for H2 (Landau and Lifshitz, 1980). At our sim- 
ulation temperatures, we can use the classical expression for the rotational kinetic energy, 
£rot = '2-kT. 

The characteristic vibrational temperature for H2 is % = 6100 K, so it is necessary to 
use the quantum expression for the vibrational kinetic energy. It is 

= e-^T-i 

For D2, the characteristic temperature should be a factor of lower. 

Of course, these expressions are only valid for a free molecule. To truly treat the kinetic 
energy of the nuclei correctly in the interacting system, path integrals should be used for 
the nuclei. 



7.4 Individual Configurations 

We took several configurations from PIMC simulations at 5000K at two densities (r^ = 1 .86 
and Ts = 2.0), and compared the electronic energy using VMC, DMC, DFT-LDA, and some 
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empirical potentials. The DFT-LDA results were obtained from a plane wave code using 
an energy cutoff of 60 Rydbergs, and using the F point approximation (Ogitsu, 2000). 

The empirical potentials are the Silvera-Goldman (Silvera and Goldman, 1978) and 
the Diep-Johnson (Diep and Johnson, 2000a,b). To these we added the energy from the 
Kolos (Kolos and Wolniewicz, 1964) intramolecular potential to get the energy as a function 
of bond length variations. The Silvera-Goldman potential was obtained by fitting to low 
temperature experimental data, with pressures up to 20 kbar, and is isotropic. The Diep- 
Johnson potential is the most recent in a number of potentials for the isolated H2-H2 system. 
It was fit to the results of accurate quantum chemistry calculations for a number of H2-H2 
configurations. It is an anisotropic potential. 

The energies relative to an isolated H2 molecule are shown in Figure 7.1. The first 
thing we notice is that the classical potentials are more accurate than VMC or DFT. The 
Silvera-Goldman mostly does a good job of reproducing the DMC results. ^ Some of the 
failures of the SG potential can be attributed to the lack of anisotropy. The isolated H2-H2 
potential (Diep-Johnson) has much weaker interactions, compared with interactions in a 
denser system. 

The PIMC method itself gives an average energy of about 0.07(3) Ha for both densities. 
Improvements in the fermion nodes appear to lower the energy (Militzer and Pollock, 2000; 
Militzer and Ceperley, 2000), although the error bars are still quite large. There are also 
corrections to some internal approximations that lower the energy by an additional 0.02 
Ha (Militzer, 2000). These effects combined seem to bring the PIMC energy in rough 
agreement with the DMC energy. 

We used the Silvera-Goldman potential for pre-rejection. As seen in the Figure 7.1, 
it resembles the DMC potential even though it lacks anisotropy. A hybrid potential was 
created by Cui et al. (1997), taking the isotropic part from a potential that was fit to high 
density, and combining that with the anisotropic part from one of the isolated H2-H2 poten- 
tials. We did not pursue this approach for constructing a better potential for pre-rejection. 

7.5 Results 

We obtained results from simulations at three state points, two of which can be compared 
with the gas gun data of Holmes et al. (1995). The pressure is given in Table 7.1, with 
results from the gas gun experiments, the Saumon-Chabrier model, from simulations using 
the Silvera-Goldman potential, and from our CEIMC simulations. These state points are in 
the fluid molecular H2 phase. For the gas gun experiments, the uncertainties in the mear- 
^It should be noted that we are taking the SG potential far from the temperature range it was fit to. 
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Figure 7.1: Electronic energy for several configurations computed by several methods. 
The energy is relative to an isolated H2 molecule. 



sured temperatures are around 100-200K. The experimental uncertainties in the volume 
and pressure were not given, but previous work indicates that they are about 1-2% (Nellis 
etal, 1983). 

We did CEIMC calculations using VMC or DMC for computing the underlying elec- 
tronic energy, which are the first such QMC calculations in this range. The simulations at 
= 2.1 and = 1.8 were done with 32 molecules, and the simulations at Vs = 2.202 were 
done with 16 molecules. We see that the pressures from VMC and DMC are very similar, 
and that for = 2.1 we get good agreement with experiment. 

There is a larger discrepancy with experiment at = 2.202. The finite size effects are 
fairly large, especially with DMC. We also did simulations at = 2.1 with 16 molecules 
and obtained pressures of 0.264(3) Mbar for CEIMC-VMC and 0.129(4) Mbar for CEIMC- 
DMC. The Silvera-Goldman potential showed much smaller finite size effects than the 
CEIMC simulations, so we that the electronic part of the simulation is largely responsible 
for the observed finite size effects. 

The energies for all these systems are given in Table 7.2. The energy at = 2.1 with 
16 molecules for CEIMC-VMC is 0.0711(4) Ha and for CEIMC-DMC is 0.0721(8) Ha. 
The average molecular bond length is given in Table 7.3, and we see the bond length is 
compressed relative to the free molecule. The proton-proton distribution functions com- 
paring CEIMC-VMC and CEIMC-DMC are shown in Figure 7.2. The VMC and DMC 
distribution functions look similar, with the first large intramolecular peak around r = 1 .4 
and the intermolecular peak around r = 4.5. 
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Table 7.1: Pressure from simulations and shock wave experiments 
Ts V(cc/mol) T(K) Pressure (Mbar) 

Gasgun S-C S-G CEIMC-VMC CEIMC-DMC 
2.100 6.92 4530 0.234 0.213 0.201 0.226(4) 0.225(3) 
2.202 7.98 2820 0.120 0.125 0.116 0.105(6) 0.10(5) 
1.800 4.36 3000 - - 0.528 - 0.433(4) 



Table 7.2: Energy from simulations and models, relative to the ground state of an isolated 
H2 molecule. The H2 column is a single thermally excited molecule plus the quantum 
vibrational KE. 

Vs V(cc/mol) T(K) Energy (Ha/molecule) 

H2 S-C S-G CEIMC-VMC CEIMC-DMC 
2.100 6.92 4530 0.0493 0.0643 0.0689 0.0663(8) 0.0617(2) 
2.202 7.98 2820 0.0290 0.0367 0.0408 0.0305(8) 0.0334(9) 
1.800 4.36 3000 0.0311 - 0.0722 - 0.055(1) 



Table 7.3: Average molecular H2 bond length. The H2 column is a single thermally excited 
molecule in free space. 

r^ T(K) Average bond length (Bohr) 

H2 CEIMC-VMC CEIMC-DMC 
2.100 4530 1.550 1.431(1) 1.413(3) 
2.202 2820 1.486 1.443(1) 1.429(6) 
1.800 3000 1.492 - 1.410(1) 




(a) 




(b) 



Figure 7.2: Proton pair distribution function g{r) for (a) r^ — l.l and T=4530 K (b) = 
2.202 and T=2820 K 
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Figure 7.3: The proton pair distribution function, g(r), close to r ^ = 1.8 and T = 3000K. 

The CEIMC-VMC simulations at rs = 1.8 and 3000 K never converged. Starting from 
a liquid state, the energy decreased the entire simulation. Looking at the configurations re- 
vealed they were forming a plane. It is not clear whether it was trying to freeze, or forming 
structures similar to those found in DFT-LDA calculations with insufficient Brillouin zone 
sampling (Hohl etal, 1993; Kohanoff eM/. , 1997). The CEIMC-DMC simulations did not 
appear to have any difficulty, so is seems the VMC behavior was due to inadequacies of the 
wave function. 

Hohl et al. (1993) did DFT-LDA simulations at r, = 1.78 and T=3000K, which is very 
close to our simulations at = 1.8. The resulting proton-proton distribution functions are 
compared in Figure 7.3. The discrepancy between CEIMC and LDA in the intramolecular 
portion of the curve has several possible causes. On the CEIMC side, it may be due to 
an insufficiently long run or due to the molecular nature of the wave function, which does 
not allow dissociation. The deficiencies of LDA may account for it preferring fewer and 
less tightly bound molecules. LDA is known to overestimate the bond length of a free 
hydrogen molecule (Hohl et al, 1993), which would account for the shifted location of the 
bond length peak. 

7.6 Simulation analysis 

We also recorded some diagnostic information about the workings of the simulation, such 
as the average noise level, the relative noise parameter / = {^G)-^t/to, and a quantity called 
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the additional noise rejection ratio, T|. When a move is rejected with the penalty method, it 
is useful to recompute the acceptance decision with the same random number and without 
the noise penalty. If the move would have been accepted without the noise penalty, it 
is considered a rejection due to noise (as opposed to a rejection due to a trial move that 
increases the energy). This can be used to monitor the effects of the noise on the simulation. 
The additional noise rejection ratio is defined as 

Tl = ^^^^^ (7.6) 

N rioise_rej + ^ accept 

If T| is small, the noise is causing few additional rejections. If T| is 1/2, the noise is causing 
as many moves to be rejected as accepted. As r| — > 1, the noise is causing many moves to 
be rejected. 

Table 7.4 show the noise level ((3o) , the relative noise parameter, /, the additional noise 
rejection ratio, a ratio of the error level for the direct method and the two-sided method, 
and the time for a single quantum step. Looking at /, we see it is small for VMC and large 
for DMC. This is because of VMC optimization takes a proportionately larger amount of 
time in the VMC run than in the DMC run. 

We used the two-sided method for computing energy differences of trial moves with 
VMC, but only used the direct method with DMC. The column headed <y^/<yjs shows 
how the efficiency of computing the energy difference is improved by using the two-sided 
method. This improvement is only in the energy difference part of the total time, the op- 
timization time is unaffected (and is a large part of the run time, since the / parameter is 
small). In Chapter 3, there was an example showing that the two-sided method was not as 
effective for DMC as for VMC. But in these simulations the DMC runs have a much larger 
/ parameter, so even small reductions in the noise level would have an impact on the run 
time. 

Some of DMC energy differences had values of noise many times greater than the 
average, which may be due to an instability in the DMC algorithm. We removed these 
outliers in computing the average noise level. 

We tried the method for even lower temperatures with a simulation at T=800 K and 
= 1 .8 and it had a promising start, but after a while the acceptance ratio dropped and we 
were unable to get any usable data. 

7.7 Future Work 

The finite size effects in DMC need to be resolved. Using Ewald sums for computing the 
Coulomb interaction might help alleviate some of the finite size effects. Extending the 
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Table 7.4: Simulation quantities ordered according to average noise level, |3o. The time 
column is the time for a single quantum step in minutes on an SGI Origin 2000. N is the 
number of molecules in the simulation. 





T(K) 
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Pa 
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0.27 
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0.30 


0.15 
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2.28 


0.28 
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0.40 




92 


1.800 
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32 


DMC 


2.42 


13.1 


0.42 
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wave function to allow for dissociated molecules and to provide for ionization would help 
make the simulations more accurate, particularly at higher temperatures and pressures. 
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Chapter 8 
Conclusions 



In this work we have developed new methods for increasing the scope of QMC calcula- 
tions, and for increasing their efficiency. Variational Monte Carlo depends on optimizing 
parameters, but the presence of noise makes it difficult. We have examined several different 
kinds of optimization approaches and compared them. Further work should improve these 
methods even more. 

The boson hard sphere model is an important theoretical model. We have performed 
"computational experiments" to obtain the ground state energy of this model. The effects 
of long range correlation on the energy are masked by the current uncertainty in the infinite 
system size results. However, if more accurate results are desired, the nature of the long 
range correlations and their effect on the energy will need to be more clearly resolved. 

As a method for including increasingly more detailed and accurate physical effects in 
our simulations, we have developed the Coupled Electronic-Ionic Monte Carlo method. 
The central idea is simple, but several supporting developments were needed to make it 
computationally feasible. The penalty method enables use of energy differences with a 
noise level of approximately kgT , rather than needing noise smaller than some fraction 
of kgT to avoid bias. The two sided energy difference method can stably compute these 
energy differences. 

The CEIMC method was applied to a system of molecular hydrogen at a few state 
points. It shows promise for generating accurate simulation results. 
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Appendix A 
Determinant Properties 



The elements of the Slater matrix are Dij — ^j{ri). The Slater determinant looks like 

(|)i(ri) ... <Sfn{r\) 

\ ■-. ; (A.1) 
hifn) ■■■ <t>«(^n) 

We assume that the single particle orbitals depend only on a single coordinate (ie, no back- 
flow). 

The determinant of a matrix can be computed using the expansion by cof actors. This 
expands the determinant of aaN xN matrix into a sum of determinants of (A^ — 1) x 
(A^ — 1) matrices. As a recursive algorithm for computing the determinant, it is not very 
efficient, but for theoretical analysis, it is very useful for isolating the influence of a single 
row or column. 

Define the cofactors of a matrix M to be 



cij = {-iy+j\M, 



(A.2) 



where the matrix formed by Cij is called the cofactor matrix. The matrix Mij is an (A^ — 
1) X (A^ — 1) matrix formed by removing row i and column j from A. The determinant of 
A can then be written as 

\^\^Y,^k fk j = ^ikCik (A.3) 

for ^ = I...N. The transpose of the cofactor matrix is called the adjoint of A. Now the 
adjoint is related to the inverse by 



adj A = \A\A~ 



(A.4) 



To compute the ratio of determinants, expand the determinant of D{r'j^) in cofactors 
about the ^th row. Note that then the cofactors have no dependence on r^. 
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Wk)\ 



'LWk)\D{rk)\{D-\r,))ik 



Wk)\ 



\D{rk)\ 



If a move is accepted, the inverse matrix can be updated in 0{N^) time (rather than 
0{N^) for recomputing the inverse). The formula for updating an inverse if only a sin- 
gle row (or column) changes was given by Sherman and Morrison (1951). Let q be the 
ratio of determinants given above. Row k merely needs to be updated to reflect the new 
determinant, D^.^ = D^j /q . The other rows are updated as 



D 



-1 

ik 



(A.5) 
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Appendix B 

Elements of the Local Energy 



The wave function has the form 

\|/r=Dexp[-C/] (B.l) 
where D is the product of a spin up and a spin down Slater determinant and 

U = Y,u{rij). (B.2) 

The local energy is then 

In Diffusion Monte Carlo, we need the quantum force, Fq = Vln |\|/| . 

F^^l(^\-lWU (B.4) 



D J 



The derivatives of the Jastrow factors are 



^kUee = £";.(r/fc)^5— ^ (B.5) 

^ rji — R 

^kUne = "ne(^to) (B-6) 

a=l ^ka 

^kUee = Y.-<eink)+u'Unk) (B.l) 

i^k 

M 2 

"^k^ne = — (B.8) 
a=l ^ka 

For the gradient with respect to particle k, expand the determinant by cofactors about row 
k. Then the cofactors have no dependence. 

Vk\d\ 



\d\ 



= Yi'^k'^MWk' (B.9) 

i 
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Appendix C 
Cusp Condition 



When two Coulomb particles get close, the potential has a 1 /r singularity. The wave func- 
tion must have the correct form to cancel this singularity. First, consider an electron and a 
nucleus. The relevant part of the Schrodinger equation is 



2M " 



V- 

2 ' 



Ze 



2^ 



\|/ = Exi/ 



(C.l) 



where M is the nuclear mass and Z is the nuclear charge. Assume that M 3> me, so the first 
term can be ignored. Write the second term in spherical coordinates and we get 

(zeV+ V) = m (C.2) 

In order for the singularity to cancel at small r, the term multiplying 1/r must vanish. So 
we have 



-Ze' 



(C.3) 



If \|/ = e we must have c = Ze^. 

For the case of two electrons, the Schrodinger equation is 



4vf- 



1 



V5 + 1- 



(C.4) 



-v?, + — 



\|/ = £'\|/ 



(C.5) 



2^2- ri2. 
Switching to relative coordinates r\i = r\—ri gives us 

Electrons with unlike spins (no antisymmetry requirement) have an extra factor of 1 /2 in 
the cusp condition compared with the electron-nucleus case. So we have c = —e^/2. 

In the antisymmetric case, the electrons will be in a relative p state, reducing the cusp 
condition by 1/2, so c = — Having the correct cusp for like spin electrons gains very 
little in the energy or the variance, since the antisymmetry requirement keeps them apart 
anyway. 
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